[Python-Dev] IO module improvements
chambon.pascal at gmail.com
Sat Feb 6 12:43:08 CET 2010
Antoine Pitrou a écrit :
> What is the difference between "file handle" and a regular C file descriptor?
> Is it some Windows-specific thing?
> If so, then perhaps it deserves some Windows-specific attribute ("handle"?).
At the moment it's windows-specific, but it's not impossible that some
other OSes also rely on specific file handles (only emulating C file
descriptors for compatibility).
I've indeed mirrored the fileno concept, with a "handle" argument for
constructors, and a handle() getter.
> On Fri, Feb 5, 2010 at 5:28 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Pascal Chambon <pythoniks <at> gmail.com> writes:
>>> By the way, I'm having trouble with the "name" attribute of raw files,
>>> which can be string or integer (confusing), ambiguous if containing a
>>> relative path,
> Why is it ambiguous? It sounds like you're using str() of the name and
> then can't tell whether the file is named e.g. '1' or whether it
> refers to file descriptor 1 (i.e. sys.stdout).
As Jean-Paul mentioned, I find confusing the fact that it can be a
relative path, and sometimes not a path at all. I'm pretty sure many
programmers haven't even cared in their library code that it could be a
non-string, using concatenation etc. on it...
However I guess that the history is so high on it, that I'll have to
conform to this semantic, putting all paths/fileno/handle in the same
"name" property, and adding an "origin" property telling how to
interpret the "name"...
>> Methods too would deserve some auto-forwarding. If you want to bufferize
>> a raw stream which also offers size(), times(), lock_file() and other
>> methods, how can these be accessed from a top-level buffering/text
>> stream ?
> I think it's a bad idea. If you forget to implement one of the standard IO
> methods (e.g. seek()), it will get forwarded to the raw stream, but with the
> wrong semantics (because it won't take buffering into account).
> It's better to require the implementor to do the forwarding explicitly if
> desired, IMO.
The problem is, doing that forwarding is quite complicated. IO is a
collection of "core tools for working with streams", but it's currently
not flexible enough to let people customize them too...
For example, if I want to add a new series of methods to all standard
streams, which simply forward calls to new raw stream features, what do
I do ? Monkey-patching base classes (RawFileIO, BufferedIOBase...) ? Not
a good pattern. Subclassing
That's really redundant...
And there are sepecially flaws around BufferedRandom. This stream
inherits BufferedWriter and BufferedRandom, and overrides some methods.
How do I do to extend it ? I'd want to reuse its methods, but then have
it forward calls to MY buffered classes, not original BufferedWriter or
BufferredReader classes. Should I modify its __bases__ to edit the
inheritance tree ? Handy but not a good pattern... I'm currently getting
what I want with a triple inheritance (praying for the MRO to be as I
expect), but it's really not straightforward.
Having BufferedRandom as an additional layer would slow down the system,
but allow its reuse with custom buffered writers and readers...
>> - I feel thread-safety locking and stream stream status checking are
>> currently overly complicated. All methods are filled with locking calls
>> and CheckClosed() calls, which is both a performance loss (most io
>> streams will have 3 such levels of locking, when 1 would suffice)
> FileIO objects don't have a lock, so there are 2 levels of locking at worse, not
> 3 (and, actually, TextIOWrapper doesn't have a lock either, although perhaps it
> As for the checkClosed() calls, they are probably cheap, especially if they
> bypass regular attribute lookup.
CheckClosed calls are cheap, but they can easily be forgotten in one of
the dozens of methods involved...
My own FileIO class alas needs locking, because for example, on windows
truncating a file means seeking + setting end of file + restoring pointer.
And I TextIOWrapper seems to deserve locks. Maybe excerpts like this one
really are thread-safe, but a long study would be required to ensure it.
if whence == 2: # seek relative to end of file
if cookie != 0:
raise IOError("can't do nonzero end-relative seeks")
position = self.buffer.seek(0, 2)
self._snapshot = None
>> Since we're anyway in a mood of imbricating streams, why not simply
>> adding a "safety stream" on top of each stream chain returned by open()
>> ? That layer could gracefully handle mutex locking, CheckClosed() calls,
>> and even, maybe, the attribute/method forwarding I evocated above.
> It's an interesting idea, but it could also end up slower than the current
> First because you are adding a level of indirection (i.e. additional method
> lookups and method calls).
> Second because currently the locks aren't always taken. For example, in
> BufferedIOReader, we needn't take the lock when the requested data is available
> in our buffer (the GIL already protects us). Having a separate "synchronizing"
> wrapper would forbid such micro-optimizations.
> If you want to experiment with this, you can use iobench (in the Tools
> directory) to measure file IO performance.
There are chances that my approach is slower, but the gains are so high
in terms of maintainability and use of use, that I would definitely
Typically, the micro-optimizations you speak about can please heavy
programs, but they make code a mined land (maybe that's why they haven't
been put into _pyio :p).
When the order of every instruction matters, when all is carefully
crafted so that the Gil is sufficient, I personally don't dare touching
There is for sure an important trade-off between speed and robustness
here, but I fear speed has won too much so far (and now that the main
implementation is in C, it's getting real hard to apprehend).
Maybe I should take the latest _pyio version, and make a fork offering
high level flexibility and security, for those who don't care about so
high performances ?
>> - some semantic decisions of the current system are somehow dangerous.
>> For example, flushing errors occuring on close are swallowed. It seems
>> to me that it's of the utmost importance that the user be warned if the
>> bytes he wrote disappeared before reaching the kernel ; shouldn't we
>> decidedly enforce a "don't hide errors" everywhere in the io module ?
> It may be a bug. Can you report it, along with a script or test showcasing it?
It seems a rather decided semantic (with comments like "#If flush()
fails, just give up"), but yep I'll file a bug to be sure.
> I don't think this can be helped though -- I really don't want open()
> to be slowed down or complicated by an attempt to do path
> manipulation. If this matters to the app author they should use
> os.path.abspath() or os.path.realpath() or whatever before calling
On second thought, having more precise "name" or "path" attributes might
give users the impression that they can rely on them, whereas indeed the
filesystem might have been modified a lot during the use of the stream
(even on windows, where files can actually be renamed/deleted while
> AFAIK, they aren't simple indexes in windows, and that's partly why
> even file descriptors cannot be safely passed between C runtimes on
> windows (whereas they can in most unices).
Yep, windows file descriptors are actually emulated (with bugs...) on
top of native file handles, that's why we can't rely on them for
advanced stream operations.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev