[Python-Dev] Fuzziness in io module specs - PEP update proposition V2

Sun Sep 27 16:51:34 CEST 2009

Pascal Chambon wrote:
> Hello
> 
> Below is a corrected version of the PEP update, adding the start/end 
> indexes proposition and fixing functions signatures. Does anyone 
> disagree with these specifications ? Or can we consider it as a target 
> for the next versions of the io module ?
> I would have no problem to implement this behaviour in my own pure 
> python FileIO system, however if someone is willing to patch the _fileio 
> implementation, it'd save a lot of time - I most probably won't have the 
> means to setup a C compilation environment under windows and linux, and 
> properly update/test this, before January (when I get freelance...).
> 
> I launch another thread on other to-be-discussed IO points B-)
> 
> Regards,
> Pascal
> 
> ================ PEP UPDATE for new I/O system - v2 ===========
> 
> **Truncate and file pointer semantics**
> 
> Rationale :
> 
> The current implementation of truncate() always move the file pointer to 
> the new end of file.
> 
> This behaviour is interesting for compatibility, if the file has been 
> reduced and the file pointer is now past its end, since some platforms 
> might require 0 <= filepointer <= filesize.
> 
> However, there are several arguments against this semantic:
> 
>     * Most common standards (posix, win32…) allow the file pointer to be
>       past the end of file, and define the behaviour of other stream
>       methods in this case
>     * In many cases, moving the filepointer when truncating has no
>       reasons to happen (if we’re extending the file, or reducing it
>       without going beneath the file pointer)
>     * Making 0 <= filepointer <= filesize a global rule of the python IO
>       module doesn’t seems possible, since it would require
>       modifications of the semantic of other methods (eg. seek() should
>       raise exceptions or silently disobey when asked to move the
>       filepointer past the end of file), and lead to incoherent
>       situations when concurrently accessing files without locking (what
>       if another process truncates to 0 bytes the file you’re writing ?)
> 
> So here is the proposed semantic, which matches established conventions:
> 
> *IOBase.truncate(n: int = None) -> int*
> 
> Resizes the file to the size specified by the positive integer n, or by 
> the current filepointer position if n is None.
> 
> The file must be opened with write permissions.
> 
> If the file was previously larger than size, the extra data is discarded.
> If the file was previously shorter than size, its size is increased, and
> the extended area appears as if it were zero-filled.
> 
Instead of "than size", perhaps "than n".

> In any case, the file pointer is left unchanged, and may point beyond
> the end of file.
> 
> Note: trying to read past the end of file returns an empty string, and
> trying to write past the end of file extends it by zero-ing the gap. On
> rare platforms which don't support file pointers to be beyond the end of
> file, all these behaviours shall be faked thanks to internal storage of
> the "wanted" file pointer position (silently extending the file, if
> necessary, when a write operation occurs).
> 
>  
> 
> *Propositions of doc update*
> 
> *RawIOBase*.read(n: int) -> bytes
> 
> Read up to n bytes from the object and return them. Fewer than n bytes
> may be returned if the operating system call returns fewer than n bytes.
> If 0 bytes are returned, and n was not 0, this indicates end of file. If
> the object is in non-blocking mode and no bytes are available, the call
> returns None.
> 
> 
> *RawIOBase*.readinto(b: bytearray, [start: int = None], [end: int = 
> None]) -> int
> 
> start and end are used as slice indexes, so that the bytearray taken 
> into account is actually range = b[start:end] (or b[start:], b[:end] or 
> b[:], depending on the arguments which are not None).
> 
> Read up to len(range) bytes from the object and store them in b, returning
> the number of bytes read. Like .read, fewer than len(range) bytes may be
> read, and 0 indicates end of file if len(range) is not 0.
> None is returned if a non-blocking object has no bytes available. The 
> length of b is never changed.
> 
Should an exception be raised if start and/or end are out of range?