[Chicago] is there really no built-in file/iter split() thing?

Massimo Di Pierro mdipierro at cs.depaul.edu
Sun Dec 2 06:50:44 CET 2007


Try this

import re, mmap
file=open(filename,'r')
mfile=mmap.mmap(file.fileno(),0,prot=mmap.PROT_READ)
items=re.compile('[^;]+').finditer(mfile)
for item in items: print item.group()

Massimo

On Nov 30, 2007, at 3:49 PM, Kumar McMillan wrote:

> [In the hope that Chris has another awesome response...]
>
> Here is another: I have a big sql file (45M) and need to iter through
> the statements---no fancy sql parsing, I just want the statements.
> Assuming open('big.sql').read().split(';') would be a dumb idea, I
> couldn't find anything in stdlib to do this.  What am I missing?  I
> thought the tokenize module would but I couldn't see how at first
> glance.
>
> def readsplit(filelike, token):
>     """yields each chunk between tokens in contents of filelike  
> object.
>
>     For example::
>
>>>> [c for c in readsplit(StringIO('''bad; ass; elf in
>         ... the forest;'''), ';')]
>         ...
>         ['bad', ' ass', ' elf in \\nthe forest', '']
>>>> [c for c in readsplit(StringIO(''';
>         ... 1,2,3;
>         ...    and 4; and
>         ... even 5'''), ';')]
>         ...
>         ['', '\\n1,2,3', '\\n   and 4', ' and\\neven 5']
>>>>
>
>     """
>     buf = []
>     for line in filelike:
>         buf.append(line)
>         line = ''.join(buf)
>         buf[:] = []
>         chunks = line.split(';')
>         for chunk in chunks[:-1]:
>             yield chunk
>         buf.append(chunks[-1])
>     if len(buf):
>         yield ''.join(buf)<readsplit.py><ATT00001>



More information about the Chicago mailing list