[Python-3000] iostack, second revision

Fri Sep 15 01:05:28 CEST 2006

Josiah Carlson wrote:
 > Any sane person uses os.stat(f.name) or os.fstat(f.fileno()), unless
 > they want to seek to the end of the file for later writing or expected
 > reading of data yet-to-be-written.

os.fstat(f.fileno()).st_size doesn't work for file-like objects.
Goodbye unit testing with StringIOs.  f.seek(0,2);f.tell() is faster,
too.  I think the lunatics have a point.

 > You were also talking about buffering writes to reduce the overhead of
 > the underlying seeks and tells because of apparent "optimizations" you
 > wanted to make. Here is a data integrity optimization you can make for
 > me: flush when accessing the file non-sequentially, any other behavior
 > could corrupt the data of users who have been relying on "seek implies
 > flush".

Again, that's what explicit calls to flush are for.  And you can't
violate expectations as to what the seek method does, when there's no
seek method and no concept of a file pointer.
Sprinkling extra flushes out here and there does not help data
integrity: Only a flush that is part of a well thought out plan to
recover partially written data in case of a crash, will help you do
that.  Anything less, and you're just a power failure and a disk that
reorders writes away from unrecoverable corruption.

My class consolidate writes, but doesn't reorder them.  That means
that to the extent that the system call for writing is transactional,
writes are not reordered.  I put the code up at
http://pastecode.com/4818.  As is - extending and truncating has bugs.

If you really want it, it's three lines changed to disable buffering
for non-sequential writes.  And an equivalent class completely without
buffering is pretty trivial.

 > With that said, I'm not sure your FileBytes object is really necessary
 > or desired for the future io library.  If people want that kind of an
 > interface, they can use mmap (and push for the various mmap bugs/feature
 > requests to be fixed), otherwise they should be using readable /
 > writable / both streams, something that Tomer has been working towards.

mmap has limitations that cannot be fixed.  It takes up virtual
memory, limiting the size of files you can work with.  You need to
specify the size in advance (note the potential race condition in
f=mmap.mmap(f.fileno(),os.fstat(f.fileno()))).  To what extent does it
work over networked file systems?  If you map a file on a file system
that is subsequently unmounted, a core dump may be the result.  All
this assuming the operating system supports mmap at all.

mmap is for use where speed is paramount, and pretty much only then.
The reason people don't use sequence-based file interfaces as much is
that robust, portable, practical sequence-based file interfaces aren't
available.  Probably most people who would have liked a sequence
interface do what I do: slurp up the whole file in one read and deal
with the string.  Or use mmap and live with the fragility.

- Anders