cataloging words in text file
Grant Edwards
grante at visi.com
Fri Mar 2 18:24:16 EST 2001
In article <eFVn6.400$z6.34905 at ruti.visi.com>, Grant Edwards wrote:
>In article <v9Vn6.396$z6.33544 at ruti.visi.com>, Grant Edwards wrote:
>
>>>I remember this homework assignment for my data structures (c++)
>>>class: read in a large file, and create a data structure containing
>>>every word in the file and the number of times it appears.
[...]
>>You may want something a little more sophisticated
[...]
>A call to translate() with an appropriate translation table
>before the split() solves those issues.
----------------------------------------------------------------------
#!/usr/local/bin/python2.1
import sys
t = "".join([(" ",chr(x))[chr(x).isalnum()] for x in range(128)]) + " "*128
d={}
for w in sys.stdin.read().translate(t).split():
if d.has_key(w):
d[w] += 1
else:
d[w] = 1
print d
----------------------------------------------------------------------
I'm particularly proud of the obfuscated construction of the
translation table. Looks like time spent reading those
threads on ?: wasn't wasted after all. ;)
Surprisingly, isalnum() returns true for any value larger than
chr(128). I assume that's a characteristic of isalnum() in the
underlying libc. Generally one uses isascii() to determine if
the call to isalnum() will return a meaningful value, but
Python doesn't expose isascii().
--
Grant Edwards grante Yow! Life is selling
at REVOLUTIONARY HAIR
visi.com PRODUCTS!
More information about the Python-list
mailing list