[Python-Dev] IO module precisions and exception hierarchy
chambon.pascal at gmail.com
Sun Sep 27 10:20:23 CEST 2009
Found in current io PEP :
Q: Do we want to mandate in the specification that switching between
reading and writing on a read-write object implies a .flush()? Or is
that an implementation convenience that users should not rely on?
-> it seems that the only important matter is : file pointer positions
and bytes/characters read should always be the ones that the user
expects, as if there
were no buffering. So flushing or not may stay a non-mandatory
behaviour, as long as the buffered streams ensures this data integrity.
Eg. If a user opens a file in r/w mode, writes two bytes in it (which
stay buffered), and then reads 2 bytes, the two bytes read should be
those on range [2:4] of course, even though the file pointer would, due
to python buffering, still be at index 0.
Q from me : What happens in read/write text files, when overwriting a
three-bytes character with a single-byte character ? Or at the contrary,
when a single chinese character overrides 3 ASCII characters in an UTF8
file ? Is there any system designed to avoid this data corruption ? Or
should TextIO classes forbid read+write streams ?
IO Exceptions :
Currently, the situation is kind of fuzzy around EnvironmentError
* OSError represents errors notified by the OS via errno.h error codes
(as mirrored in the python "errno" module).
errno.h errors (less than 125 error codes) seem to represent the whole
of *nix system errors. However, Windows has many more system errors
(15000+). So windows errors, when they can't be mapped to one of the
errno errors are raises as "WindowsError" instances (a subclass of
OSError), with the special attribute "winerror" indicating that win32
* IOError are "errors raised because of I/O problems", but they use
errno codes, like OSError.
Thus, at the moment IOErrors rather have the semantic of "particular
case of OSError", and it's kind of confusing to have them remain in
their own separate tree... Furthermore, OSErrors are often used where
IOErrors would perfectly fit, eg. in low level I/O functions of the OS
Since OSErrors and IOErrors are slightly mixed up when we deal with IO
operations, maybe the easiest way to make it clearer would be to push to
their limits already existing designs.
- the os module should only raise OSErrors, whatever the os operation
involved (maybe it's already the case in CPython, isn't it ?)
- the io module should only raise IOErrors and its subclasses, so that
davs can easily take measures depending on the cause of the io failure
(except 1 OSError exception, it's already the case in _fileio)
- other modules refering to i/o might maybe keep their current (fuzzy)
behaviour, since they're more platform specific, and should in the end
be replaced by a crossplatform solution (at least I'd love it to happen)
Until there, there would be no real benefits for the user, compared to
catching EnvironmentErrors as most probably do. But the sweet thing
would be to offer a concise but meaningfull IOError hierarchy, so that
we can easily handle most specific errors gracefully (having a disk full
is not the same level of gravity as simply having another process
locking your target file).
Here is a very rough beginning of IOError hierarchy. I'd liek to have
people's opinion on the relevance of these, as well as on what other
exceptions should be distinguished from basic IOErrors.
+-InvalidStreamError (eg. we try to write on a stream opened in
+-PermissionError (mostly *nix chmod stuffs)
+-MaxFileSizeError (maybe hard to implement, happens when we exceed
4Gb on fat32 and stuffs...)
+-InvalidFileNameError (filepath max lengths, or "? / : " characters
in a windows file name...)
More information about the Python-Dev