[Tutor] anagrams

Wed Nov 5 06:32:49 EST 2003

Hi!

Triggered by a remark on the DailyPythonUrl I looked for the
Kata - Site and found an interesting little problem: finding
anagrams among the words in a textfile.

KataSix

http://pragprog.com/pragdave/Practices/Kata/KataSix.rdoc,v

The author tells us of a 25-line Ruby-Solution with a performance
of 1.5 secs on his 1 Ghz PPC. (The wordfile can be found on his
website)

I found a Python Solution, which needs 0.75 secs on my 2.3 GHz PC,
but it certainly is not optimal. (Maybe someone here likes to find
a better one?)

Another problem remains for me: My script finds 2531 groups
of anagrams with 5683 words alltogether, whereas the above
mentioned solution has only 2,530 sets of anagrams (and a total of
5,680 words). So I suspect, that my solution contains a bug, but
I couldn't find it (until now).

Here the code:

"""KataSix: Find groups of anagrams in a wordlist
http://pragprog.com/pragdave/Practices/Kata/KataSix.rdoc,v

Solution making heavy use of Tim Peters marvelous
sort - implementation
"""
# get wordlist
wordlist = open("wordlist.txt").read().split()

# convert words to lists of characters, remember index in wordlist
listlist = [(list(word.lower()),i) for (i,word) in enumerate(wordlist)]

# first sort each characterlist, then the list of characterlists
for item in listlist:
    item[0].sort()
listlist.sort()

# now anagrams form groups of consecutive characterlists - extract them:
lenwl = len(wordlist)
i, anagramgroups = 0, []
while i < lenwl:
    anagramlist = [wordlist[listlist[i][1]]]
    j = i+1
    while j < lenwl and listlist[j][0] == listlist[i][0]:
        anagramlist.append(wordlist[listlist[j][1]])
        j = j+1
    if len(anagramlist) > 1: anagramgroups.append(anagramlist)
    i = j

# Output results:
print "anagram groups:", len(anagramgroups)
print "words:", sum([len(g) for g in anagramgroups])

###############

Remarks, comments, amendments?

Gregor