[Python-Dev] IO module improvements

Fri Feb 5 14:28:27 CET 2010

Pascal Chambon <pythoniks <at> gmail.com> writes:
> 
> By the way, I'm having trouble with the "name" attribute of raw files, 
> which can be string or integer (confusing), ambiguous if containing a 
> relative path, and which isn't able to handle the new case of my 
> library, i.e opening a file from an existing file handle (which is ALSO 
> an integer, like C file descriptors...)

What is the difference between "file handle" and a regular C file descriptor?
Is it some Windows-specific thing?
If so, then perhaps it deserves some Windows-specific attribute ("handle"?).

> Methods too would deserve some auto-forwarding. If you want to bufferize 
> a raw stream which also offers size(), times(), lock_file() and other 
> methods, how can these be accessed from a top-level buffering/text 
> stream ?

I think it's a bad idea. If you forget to implement one of the standard IO
methods (e.g. seek()), it will get forwarded to the raw stream, but with the
wrong semantics (because it won't take buffering into account).

It's better to require the implementor to do the forwarding explicitly if
desired, IMO.

> - I feel thread-safety locking and stream stream status checking are 
> currently overly complicated. All methods are filled with locking calls 
> and CheckClosed() calls, which is both a performance loss (most io 
> streams will have 3 such levels of locking, when 1 would suffice)

FileIO objects don't have a lock, so there are 2 levels of locking at worse, not
3 (and, actually, TextIOWrapper doesn't have a lock either, although perhaps it
should).
As for the checkClosed() calls, they are probably cheap, especially if they
bypass regular attribute lookup.

> Since we're anyway in a mood of imbricating streams, why not simply 
> adding a "safety stream" on top of each stream chain returned by open() 
> ? That layer could gracefully handle mutex locking, CheckClosed() calls, 
> and even, maybe, the attribute/method forwarding I evocated above.

It's an interesting idea, but it could also end up slower than the current
situation.
First because you are adding a level of indirection (i.e. additional method
lookups and method calls).
Second because currently the locks aren't always taken. For example, in
BufferedIOReader, we needn't take the lock when the requested data is available
in our buffer (the GIL already protects us). Having a separate "synchronizing"
wrapper would forbid such micro-optimizations.

If you want to experiment with this, you can use iobench (in the Tools
directory) to measure file IO performance.

> - some semantic decisions of the current system are somehow dangerous. 
> For example, flushing errors occuring on close are swallowed. It seems 
> to me that it's of the utmost importance that the user be warned if the 
> bytes he wrote disappeared before reaching the kernel ; shouldn't we 
> decidedly enforce a "don't hide errors" everywhere in the io module ?

It may be a bug. Can you report it, along with a script or test showcasing it?

Regards

Antoine.