cataloging words in text file

Grant Edwards grante at
Sat Mar 3 00:24:16 CET 2001

In article <eFVn6.400$z6.34905 at>, Grant Edwards wrote:
>In article <v9Vn6.396$z6.33544 at>, Grant Edwards wrote:
>>>I remember this homework assignment for my data structures (c++)
>>>class: read in a large file, and create a data structure containing
>>>every word in the file and the number of times it appears.


>>You may want something a little more sophisticated
>A call to translate() with an appropriate translation table
>before the split() solves those issues.

import sys
t = "".join([(" ",chr(x))[chr(x).isalnum()] for x in range(128)]) + " "*128
for w in
    if d.has_key(w):
        d[w] += 1
        d[w] = 1
print d

I'm particularly proud of the obfuscated construction of the
translation table.  Looks like time spent reading those
threads on ?: wasn't wasted after all. ;)

Surprisingly, isalnum() returns true for any value larger than
chr(128).  I assume that's a characteristic of isalnum() in the
underlying libc.  Generally one uses isascii() to determine if
the call to isalnum() will return a meaningful value, but
Python doesn't expose isascii().

Grant Edwards                   grante             Yow!  Life is selling
                                  at               REVOLUTIONARY HAIR

More information about the Python-list mailing list