[Tutor] If you don't close file when writing, do bytes stay in memory?
Dave Angel
davea at ieee.org
Sat Oct 10 19:36:09 CEST 2009
Kent Johnson wrote:
> 2009/10/10 Xbox Muncher <xboxmuncher at gmail.com>:
>
>> What does flush do technically?
>> "Flush the internal buffer, like stdio‘s fflush(). This may be a no-op on some file-like objects."
>>
>> The reason I thought that closing the file after I've written about 500MB file data to it, was smart -> was because I thought that python stores that data in memory or keeps info about it somehow and only deletes this memory of it when I close the file.
>> When I write to a file in 'wb' mode at 500 bytes at a time.. I see that the file size changes as I continue to add more data, maybe not in exact 500 byte sequences as my code logic but it becomes bigger as I make more iterations still.
>>
>> Seeing this, I know that the data is definitely being written pretty immediately to the file and not being held in memory for very long. Or is it...? Does it still keep it in this "internal buffer" if I don't close the file. If it does, then flush() is exactly what I need to free the internal buffer, which is what I was trying to do when I closed the file anyways...
>>
>> However, from your replies I take it that python doesn't store this data in an internal buffer and DOES immediately dispose of the data into the file itself (of course it still exists in variables I put it in). So, closing the file doesn't free up any more memory.
>>
>
> Python file I/O is buffered. That means that there is a memory buffer
> that is used to hold a small amount of the file as it is read or
> written.
>
> You original example writes 5 bytes at a time. With unbuffered I/O,
> this would write to the disk on every call to write(). (The OS also
> has some buffering, I'm ignoring that.)
>
> With buffered writes, there is a memory buffer allocated to hold the
> data. The write() call just puts data into the buffer; when it is
> full, the buffer is written to the disk. This is a flush. Calling
> flush() forces the buffer to be written.
>
> So, a few points about your questions:
> - calling flush() after each write() will cause a disk write. This is
> probably not what you want, it will slow down the output considerably.
> - calling flush() does not de-allocate the buffer, it just writes its
> contents. So calling flush() should not change the amount of memory
> used.
> - the buffer is pretty small, maybe 8K or 32K. You can specify the
> buffer size as an argument to open() but really you probably want the
> system default.
>
> Kent
>
>
What Kent said.
I brought up flush(), not because you should do it on every write, but
because you might want to do it on a file that's open a long time,
either because it's very large, or because you're doing other things
while keeping the file open. A flush() pretty much assures that this
portion of the file is recoverable, in case of subsequent crash.
The operating system itself is also doing some buffering. After all,
the disk drive writes sectors in multiples of at least 512 bytes, so if
you write 12 bytes and flush, it needs at least to know about the other
504 bytes in the one sector. The strategy of this buffering varies
depending on lots of things outside of the control of your Python
program. For example, a removable drive can be mounted either for "fast
access" or for "most likely to be recoverable if removed unexpectedly."
These parameters (OS specific, and even version specific) will do much
more buffering than Python or the C runtime library will ever do.
Incidentally, just because you can see that the file has grown, that
doesn't mean the disk drive itself has been updated. It just means that
the in-memory version of the directory entries has been updated. Those
are buffered as well, naturally. If they weren't, then writing
performance would be truly horrendous.
Anyway, don't bother closing and re-opening, unless it's to let some
other process get access to the file. And use flush() judiciously, if
at all, considering the tradeoffs.
Did you follow my comment about using the modulo operator to do
something every nth time through a loop?
DaveA
More information about the Tutor
mailing list