Using dictionary key as a regular expression class
Terry Reedy
tjreedy at udel.edu
Sat Jan 23 02:45:41 EST 2010
On 1/22/2010 9:58 PM, Chris Jones wrote:
> On Fri, Jan 22, 2010 at 08:46:35PM EST, Terry Reedy wrote:
> Do you mean I should just read the file one character at a time?
Whoops, my misdirection (you can .read(1), but this is s l o w.
I meant to suggest processing it a char at a time.
1. If not too big,
for c in open(x, 'rb').read() # left .read() off
# 'b' will get bytes, though ord(c) same for ascii chars for byte or
unicode
2. If too big for that,
for line in open():
for c in line: # or left off this part
>> To only count ascii chars, as should be the case for C code,
>>
>> achars = [0]*63
>> for c in open('xxx', 'c'):
>> try:
>> achars[ord(c)-32] += 1
>> except IndexError:
>> pass
>>
>> for i,n in enumerate(achars)
>> print chr(i), n
>>
>> or sum subsets as desired.
>
> Thanks much for the snippet, let me play with it and see if I can come
> up with a Unicode/utf-8 version.. since while I'm at it I might as well
> write something a bit more general than C code.
>
> Since utf-8 is backward-compatible with 7bit ASCII, this shouldn't be
> a problem.
For any extended ascii, use larger array without decoding (until print,
if need be). For unicode, add encoding to open and 'c in line' will
return unicode chars. Then use *one* dict or defaultdict. I think
something like
from collections import defaultdict
d = defaultdict(int)
...
d[c] += 1 # if c is new, d[c] defaults to int() == 0
Terry Jan Reedy
More information about the Python-list
mailing list