[Tutor] Read-ahead for large fixed-width binary files?
Marc Tompkins
marc.tompkins at gmail.com
Sun Nov 18 18:09:20 CET 2007
On Nov 18, 2007 5:15 AM, Kent Johnson <kent37 at tds.net> wrote:
> Marc Tompkins wrote:
> > On Nov 17, 2007 8:20 PM, Kent Johnson <kent37 at tds.net
> > <mailto:kent37 at tds.net>> wrote:
> > use plain slicing to return the individual records instead of
> StringIO.
> >
> > I hope I'm not being obtuse, but could you clarify that?
>
> I think it will simplify the looping. A sketch, probably needs work:
>
> def by_record(path, recsize):
> with open(path,'rb') as inFile:
> inFile.read(recLen) # throw away the header record
> while True:
> buf = inFile.read(recLen*4096)
> if not buf:
> return
> for ix in range(0, len(buf), recLen):
> yield buf[ix:ix+recLen]
>
> > I'm not sure I see how this makes my
> > life better than using StringIO (especially since I'm actually using
> > cStringIO, with a "just-in-case" fallback in the import section, and it
> > seems to be pretty fast.)
>
> This version seems simpler and more readable to me.
>
> Kent
>
It does look lean and mean, true. I'll time this against the cStringIO
version. One thing, though - I think I need to do
> if len(buf) < recLen:
> return
>
rather than
> if not buf:
> return
>
I'll have to experiment again to refresh my memory, but I believe I tried
that in one of my first iterations (about a year ago, so I may be
remembering wrong.) If I remember correctly, read() was still returning a
result - but with a size that didn't evaluate to false. As you can imagine,
hilarity ensued when I tried to slice the last record.
Of course, I may have hallucinated that while on an extended caffeine jag,
so feel free to disregard!
--
www.fsrtechnologies.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20071118/65f1e23c/attachment.htm
More information about the Tutor
mailing list