String Fomat Conversion

Jeff Shannon jeff at ccvcorp.com
Thu Jan 27 14:34:31 EST 2005


Stephen Thorne wrote:

> On Thu, 27 Jan 2005 00:02:45 -0700, Steven Bethard
> <steven.bethard at gmail.com> wrote:
> 
>>By using the iterator instead of readlines, I read only one line from
>>the file into memory at once, instead of all of them.  This may or may
>>not matter depending on the size of your files, but using iterators is
>>generally more scalable, though of course it's not always possible.
> 
> I just did a teensy test. All three options used exactly the same
> amount of total memory.

I would presume that, for a small file, the entire contents of the 
file will be sucked into the read buffer implemented by the underlying 
C file library.  An iterator will only really save memory consumption 
when the file size is greater than that buffer's size.

Actually, now that I think of it, there's probably another copy of the 
data at Python level.  For readlines(), that copy is the list object 
itself.  For iter and iter.next(), it's in the iterator's read-ahead 
buffer.  So perhaps memory savings will occur when *that* buffer size 
is exceeded.  It's also quite possible that both buffers are the same 
size...

Anyhow, I'm sure that the fact that they use the same size for your 
test is a reflection of buffering.  The next question is, which 
provides the most *conceptual* simplicity?  (The answer to that one, I 
think, depends on how your brain happens to see things...)

Jeff Shannon
Technician/Programmer
Credit International




More information about the Python-list mailing list