[Tutor] anagrams
Gregor Lingl
glingl at aon.at
Wed Nov 5 06:32:49 EST 2003
Hi!
Triggered by a remark on the DailyPythonUrl I looked for the
Kata - Site and found an interesting little problem: finding
anagrams among the words in a textfile.
KataSix
http://pragprog.com/pragdave/Practices/Kata/KataSix.rdoc,v
The author tells us of a 25-line Ruby-Solution with a performance
of 1.5 secs on his 1 Ghz PPC. (The wordfile can be found on his
website)
I found a Python Solution, which needs 0.75 secs on my 2.3 GHz PC,
but it certainly is not optimal. (Maybe someone here likes to find
a better one?)
Another problem remains for me: My script finds 2531 groups
of anagrams with 5683 words alltogether, whereas the above
mentioned solution has only 2,530 sets of anagrams (and a total of
5,680 words). So I suspect, that my solution contains a bug, but
I couldn't find it (until now).
Here the code:
"""KataSix: Find groups of anagrams in a wordlist
http://pragprog.com/pragdave/Practices/Kata/KataSix.rdoc,v
Solution making heavy use of Tim Peters marvelous
sort - implementation
"""
# get wordlist
wordlist = open("wordlist.txt").read().split()
# convert words to lists of characters, remember index in wordlist
listlist = [(list(word.lower()),i) for (i,word) in enumerate(wordlist)]
# first sort each characterlist, then the list of characterlists
for item in listlist:
item[0].sort()
listlist.sort()
# now anagrams form groups of consecutive characterlists - extract them:
lenwl = len(wordlist)
i, anagramgroups = 0, []
while i < lenwl:
anagramlist = [wordlist[listlist[i][1]]]
j = i+1
while j < lenwl and listlist[j][0] == listlist[i][0]:
anagramlist.append(wordlist[listlist[j][1]])
j = j+1
if len(anagramlist) > 1: anagramgroups.append(anagramlist)
i = j
# Output results:
print "anagram groups:", len(anagramgroups)
print "words:", sum([len(g) for g in anagramgroups])
###############
Remarks, comments, amendments?
Gregor
More information about the Tutor
mailing list