Andrew> True, but note that you can compile Python with WITHOUT_COMPLEX Andrew> defined to remove complex numbers.
That's true, but that wasn't my point. I'm not arguing for or against space efficiency, just that the the rather timeworn argument about not doing anything special to support text processing because Python is a general purpose language is a red herring.
>> 1. When using something like the simple file i/o idiom >> for line in f.readlines(): >> dofunstuff(line) >> the programmer should not have to care how big the file is.
Andrew> What about 'for line in fileinput.input()', which already Andrew> exists? (Hmmm... if you have an already open file object, I Andrew> don't think you can pass it to fileinput.input(); maybe that Andrew> should be fixed.)
Well, a couple reasons jump to mind:
1. fileinput.FileInput isn't particularly efficient. At its heart, its __getitem__ method makes a simple readline() call instead of buffering some amount of readlines(sizehint) bytes. This can be fixed, but I'm not sure what would happen to its semantics.
2. As you pointed out, it's not all that general.
My point, not at all well stated, is that the programmer shouldn't have to worry (much?) about the conditions under which he does file i/o. Right now, if I know the file is small(ish), I can do
for line in f.readlines(): dofunstuff(line)
but I have to know that the file won't be big, because readlines() will behave badly (perhaps even generate a MemoryError exception) if the file is large. In that case, I have to fall back to the safer (and slower)
line = f.readline() while line: dofunstuff(line) line = f.readline()
or the more efficient, but more cumbersome
lines = f.readlines(sizehint) while lines: for line in lines: dofunstuff(line) lines = f.readlines(sizehint)
That's three separate idioms the programmer has to be aware of when writing code to read a text file based upon the perceived need for speed, memory usage and desired clarity:
fast/memory-intensive/clear slow/memory-conserving/not-as-clear fast/memory-conserving/fairly-muddy
Any particular reason that the readline method can't return an iterator that supports __getitem__ and buffers input? (Again, remember this is for py2k, so the potential breakage such a change might cause is a consideration, but not a showstopper.)
Andrew> On a vaguely related note, since there are many things like Andrew> parser generators and XML stuff and mxTextTools, I've been Andrew> speculating about a text processing topic guide. If you know of Andrew> Python packages related to text processing, please send me a Andrew> private e-mail with a link.
This sounds like a good idea to me.