[Chicago] is there really no built-in file/iter split() thing?
Massimo Di Pierro
mdipierro at cs.depaul.edu
Sun Dec 2 06:50:44 CET 2007
Try this
import re, mmap
file=open(filename,'r')
mfile=mmap.mmap(file.fileno(),0,prot=mmap.PROT_READ)
items=re.compile('[^;]+').finditer(mfile)
for item in items: print item.group()
Massimo
On Nov 30, 2007, at 3:49 PM, Kumar McMillan wrote:
> [In the hope that Chris has another awesome response...]
>
> Here is another: I have a big sql file (45M) and need to iter through
> the statements---no fancy sql parsing, I just want the statements.
> Assuming open('big.sql').read().split(';') would be a dumb idea, I
> couldn't find anything in stdlib to do this. What am I missing? I
> thought the tokenize module would but I couldn't see how at first
> glance.
>
> def readsplit(filelike, token):
> """yields each chunk between tokens in contents of filelike
> object.
>
> For example::
>
>>>> [c for c in readsplit(StringIO('''bad; ass; elf in
> ... the forest;'''), ';')]
> ...
> ['bad', ' ass', ' elf in \\nthe forest', '']
>>>> [c for c in readsplit(StringIO(''';
> ... 1,2,3;
> ... and 4; and
> ... even 5'''), ';')]
> ...
> ['', '\\n1,2,3', '\\n and 4', ' and\\neven 5']
>>>>
>
> """
> buf = []
> for line in filelike:
> buf.append(line)
> line = ''.join(buf)
> buf[:] = []
> chunks = line.split(';')
> for chunk in chunks[:-1]:
> yield chunk
> buf.append(chunks[-1])
> if len(buf):
> yield ''.join(buf)<readsplit.py><ATT00001>
More information about the Chicago
mailing list