[Tutor] If you don't close file when writing, do bytes stay in memory?

Xbox Muncher xboxmuncher at gmail.com
Sat Oct 10 16:32:09 CEST 2009


What does flush do technically?
"Flush the internal buffer, like stdio‘s fflush(). This may be a no-op on
some file-like objects."

The reason I thought that closing the file after I've written about 500MB
file data to it, was smart -> was because I thought that python stores that
data in memory or keeps info about it somehow and only deletes this memory
of it when I close the file.
When I write to a file in 'wb' mode at 500 bytes at a time.. I see that the
file size changes as I continue to add more data, maybe not in exact 500
byte sequences as my code logic but it becomes bigger as I make more
iterations still.

Seeing this, I know that the data is definitely being written pretty
immediately to the file and not being held in memory for very long. Or is
it...? Does it still keep it in this "internal buffer" if I don't close the
file. If it does, then flush() is exactly what I need to free the internal
buffer, which is what I was trying to do when I closed the file anyways...

However, from your replies I take it that python doesn't store this data in
an internal buffer and DOES immediately dispose of the data into the file
itself (of course it still exists in variables I put it in). So, closing the
file doesn't free up any more memory.

On Sat, Oct 10, 2009 at 7:02 AM, Dave Angel <davea at ieee.org> wrote:

> xbmuncher wrote:
>
>> Which piece of code will conserve more memory?
>>  I think that code #2 will because I close the file more often, thus
>> freeing
>> more memory by closing it.
>> Am I right in this thinking... or does it not save me any more bytes in
>> memory by closing the file often?
>> Sure I realize that in my example it doesn't save much if it does... but
>> I'm
>> dealing with writing large files.. so every byte freed in memory counts.
>> Thanks.
>>
>> CODE #1:
>> def getData(): return '12345' #5 bytes
>> f = open('file.ext', 'wb')
>> for i in range(2000):
>>    f.write(getData())
>>
>> f.close()
>>
>>
>> CODE #2:
>> def getData(): return '12345' #5 bytes
>> f = open('file.ext', 'wb')
>> for i in range(2000):
>>    f.write(getData())
>>    if i == 5:
>>        f.close()
>>        f = open('file.ext', 'ab')
>>        i = 1
>>    i = i + 1
>>
>> f.close()
>>
>>
>>
> You don't save a noticeable amount of memory usage by closing and
> immediately reopening the file.  The amount that the system buffers probably
> wouldn't depend on file size, in any case.  When dealing with large files,
> the thing to watch is how much of the data you've got in your own lists and
> dictionaries, not how much the file subsystem and OS are using.
>
> But you have other issues in your code.
>
> 1) you don't say what version of Python you're using.  So I'll assume it's
> version 2.x.  If so, then range is unnecessarily using a lot of memory.  It
> builds a list of ints, when an iterator would do just as well.  Use
> xrange().  ( In Python 3.x, xrange() was renamed to be called range(). )
>  This may not matter for small values, but as the number gets bigger, so
> would the amount of wastage.
>
> 2) By using the same local for the for loop as for your "should I close"
> counter, you're defeating the logic.  As it stands, it'll only do the
> close() once.  Either rename one of these, or do the simpler test, of
>     if i%5 == 0:
>          f.close()
>          f = open....
>
> 3) Close and re-open has three other effects.  One, it's slow.  Two,
> append-mode isn't guaranteed by the C standard to always position at the end
> (!).  And three, it flushes the data.  That can be a very useful result, in
> case the computer crashes while spending a long time updating a file.
>
> I'd suggest sometimes doing a flush() call on the file, if you know you'll
> be spending a long time updating it.  But I wouldn't bother closing it.
>
> DaveA
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20091010/da7f5685/attachment.htm>


More information about the Tutor mailing list