[Chicago] is there really no built-in file/iter split() thing?

Kumar McMillan kumar.mcmillan at gmail.com
Sun Dec 2 07:23:30 CET 2007


On Dec 1, 2007 11:50 PM, Massimo Di Pierro <mdipierro at cs.depaul.edu> wrote:
> Try this
>
> import re, mmap
> file=open(filename,'r')
> mfile=mmap.mmap(file.fileno(),0,prot=mmap.PROT_READ)
> items=re.compile('[^;]+').finditer(mfile)
> for item in items: print item.group()

nice!  I didn't know about mmap.

>
> Massimo
>
> On Nov 30, 2007, at 3:49 PM, Kumar McMillan wrote:
>
>
> > [In the hope that Chris has another awesome response...]
> >
> > Here is another: I have a big sql file (45M) and need to iter through
> > the statements---no fancy sql parsing, I just want the statements.
> > Assuming open('big.sql').read().split(';') would be a dumb idea, I
> > couldn't find anything in stdlib to do this.  What am I missing?  I
> > thought the tokenize module would but I couldn't see how at first
> > glance.
> >
> > def readsplit(filelike, token):
> >     """yields each chunk between tokens in contents of filelike
> > object.
> >
> >     For example::
> >
> >>>> [c for c in readsplit(StringIO('''bad; ass; elf in
> >         ... the forest;'''), ';')]
> >         ...
> >         ['bad', ' ass', ' elf in \\nthe forest', '']
> >>>> [c for c in readsplit(StringIO(''';
> >         ... 1,2,3;
> >         ...    and 4; and
> >         ... even 5'''), ';')]
> >         ...
> >         ['', '\\n1,2,3', '\\n   and 4', ' and\\neven 5']
> >>>>
> >
> >     """
> >     buf = []
> >     for line in filelike:
> >         buf.append(line)
> >         line = ''.join(buf)
> >         buf[:] = []
> >         chunks = line.split(';')
> >         for chunk in chunks[:-1]:
> >             yield chunk
> >         buf.append(chunks[-1])
> >     if len(buf):
> >         yield ''.join(buf)<readsplit.py><ATT00001>
>
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>


More information about the Chicago mailing list