proposal: another file iterator
Jean-Paul Calderone
exarkun at divmod.com
Sun Jan 15 21:20:59 EST 2006
On 15 Jan 2006 16:44:24 -0800, Paul Rubin <"http://phr.cx"@nospam.invalid> wrote:
>I find pretty often that I want to loop through characters in a file:
>
> while True:
> c = f.read(1)
> if not c: break
> ...
>
>or sometimes of some other blocksize instead of 1. It would sure
>be easier to say something like:
>
> for c in f.iterbytes(): ...
>
>or
>
> for c in f.iterbytes(blocksize): ...
>
>this isn't anything terribly advanced but just seems like a matter of
>having the built-in types keep up with language features. The current
>built-in iterator (for line in file: ...) is useful for text files but
>can potentially read strings of unbounded size, so it's inadvisable for
>arbitrary files.
>
>Does anyone else like this idea?
It's a pretty useful thing to do, but the edge-cases are somewhat complex. When I just want the dumb version, I tend to write this:
for chunk in iter(lambda: f.read(blocksize), ''):
...
Which is only very slightly longer than your version. I would like it even more if iter() had been written with the impending doom of lambda in mind, so that this would work:
for chunk in iter('', f.read, blocksize):
...
But it's a bit late now. Anyhow, here are some questions about your iterbytes():
* Would it guarantee the chunks returned were read using a single read? If blocksize were a multiple of the filesystem block size, would it guarantee reads on block-boundaries (where possible)?
* How would it handle EOF? Would it stop iterating immediately after the first short read or would it wait for an empty return?
* What would the buffering behavior be? Could one interleave calls to .next() on whatever iterbytes() returns with calls to .read() on the file?
Jean-Paul
More information about the Python-list
mailing list