cataloging words in text file

Grant Edwards grante at visi.com
Fri Mar 2 18:24:16 EST 2001


In article <eFVn6.400$z6.34905 at ruti.visi.com>, Grant Edwards wrote:
>In article <v9Vn6.396$z6.33544 at ruti.visi.com>, Grant Edwards wrote:
>
>>>I remember this homework assignment for my data structures (c++)
>>>class: read in a large file, and create a data structure containing
>>>every word in the file and the number of times it appears.

[...]

>>You may want something a little more sophisticated
[...]
>A call to translate() with an appropriate translation table
>before the split() solves those issues.

----------------------------------------------------------------------
#!/usr/local/bin/python2.1
import sys
t = "".join([(" ",chr(x))[chr(x).isalnum()] for x in range(128)]) + " "*128
d={}
for w in sys.stdin.read().translate(t).split():
    if d.has_key(w):
        d[w] += 1
    else:
        d[w] = 1
print d
----------------------------------------------------------------------

I'm particularly proud of the obfuscated construction of the
translation table.  Looks like time spent reading those
threads on ?: wasn't wasted after all. ;)

Surprisingly, isalnum() returns true for any value larger than
chr(128).  I assume that's a characteristic of isalnum() in the
underlying libc.  Generally one uses isascii() to determine if
the call to isalnum() will return a meaningful value, but
Python doesn't expose isascii().

-- 
Grant Edwards                   grante             Yow!  Life is selling
                                  at               REVOLUTIONARY HAIR
                               visi.com            PRODUCTS!



More information about the Python-list mailing list