Faster way to do this?

andrew cooke andrew at acooke.org
Wed May 21 10:38:09 EDT 2003


Freddie <oinkfreddie at oinkmadcowdisease.oinkorg> writes:
> 'Skip Montanaro' sent me an e-mail with a way (I think) of generating the 
> list of words, and it's actually a lot faster, for _small input lengths_. As 
> the length of the input increases, my version's run time seems to increase 
> fairly linearly, while Skip's increases quite scarily. Some timings, mine all 
> using my method2():

skip's method tests each permutation of letters; yours tests each word
in the dictionary.  the number of permutations for n letters is n!
there are 45,000 words in the dictionary, so the two methods should be
comparable when n! = 45,000 or n is about 8 characters.  

> 7 letters: 'hellosi'
> mine - found 34 words in 0.93s
> Skip - found 34 words in 0.56 seconds

turns out that it's 7 characters because skip's method has more overhead.

note that skip's method scales much better as the dictionary size
increases - time remains almost constant (but your method isn't that
bad; time is proportional to dictionary size, which isn't anything
like as badly behanved as skip's method, which has exponential growth
with word length).

you might be able to find a compromise method that divides words into
two groups by length.  as you read the words in, put short words in
one hash table and long words in another.  then scan the list of long
words using your method (quick because the table should be fairly
small) and the scan of short words using a modified version of skip's
method (quick because you cut the method short once the words reach
the maximum size).  from the results above, you might consider a short
word to be 6 or 7 characters or less.

andrew

-- 
http://www.acooke.org






More information about the Python-list mailing list