[Python-3000] Draft PEP for New IO system
rasky at develer.com
Wed Feb 28 09:20:01 CET 2007
[reposting since the first time it didn't get through...]
On 26/02/2007 22.35, Mike Verdone wrote:
> Daniel Stutzbach and I have prepared a draft PEP for the new IO system
> for Python 3000. This document is, hopefully, true to the info that
> Guido wrote on the whiteboards here at PyCon. This is still a draft
> and there's quite a few decisions that need to be made. Feedback is
Thanks for this!
> Raw I/O
> The abstract base class for raw I/O is RawIOBase. It has several
> methods which are wrappers around the appropriate operating system
> call. If one of these functions would not make sense on the object,
> the implementation must raise an IOError exception. For example, if a
> file is opened read-only, the .write() method will raise an IOError.
> As another example, if the object represents a socket, then .seek(),
> .tell(), and .truncate() will raise an IOError.
> .read(n: int) -> bytes
> .readinto(b: bytes) -> int
> .write(b: bytes) -> int
What are the requirements here?
- Can read()/readinto() return *less* bytes than specified?
- Can read() return a 0-sized byte object (=no data available)?
- Can read() return *more* bytes than specified (think of a datagram socket or
a decompressing stream)?
- Can readinto() read *less* bytes than specified?
- Can readinto() read zero bytes?
- Should read()/readinto() raise EOFError?
- Can write() write less bytes than specified?
- Can write() write zero bytes?
Please, see also the examples at the end of the mail before providing an answer :)
> .seek(pos: int, whence: int = 0) -> None
> .tell() -> int
> .truncate(n: int = None) -> None
> .close() -> None
Why should this very low-level basic type define *two* read methods? Assuming
that readinto() is the most primitive, can we have the ABC RawIOBase provide a
default read() method that calls readinto?
Consider providing more ABC/mixins to help implementations.
ReadIOBase/WriteIOBase are pretty obvious:
def readable(self): return False
def writeable(self): return False
def seekable(self): return False
def read(self,n): raise IOError
def readinto(self,b): raise IOError
def write(self,b): raise IOError
def seek(self,pos,wh): raise IOError
def tell(self): raise IOError
def truncate(self,n=None): raise IOError
def readable(self): return True
def read(self, n):
b = bytes(n) #whatever
def readinto(self, b):
# must implement only this and nothing else
class MySpecialReaderWriter(ReadIOBase, WriteIOBase):
def readinto(self, b):
def write(self, b):
> (should these "is_" functions be attributes instead?
> "file.readable == True")
Yes, I think readable/writeable/seekable/fileno *perfectly* match the good
usage of attributes/properties. They all provide a value without any
side-effect and that can be computed without doing O(n)-style computations.
> Buffered I/O
> The next layer is the Buffer I/O layer which provides more efficient
> access to file-like objects. The abstract base class for all Buffered
I think you probably want the buffer size to be optionally specified by the
user, for the standard 4 implementations.
> Q: Do we want to mandate in the specification that switching between
> reading to writing on a read-write object implies a .flush()? Or is
> that an implementation convenience that users should not rely on?
I'd be glad if using flush() wasn't a requirement for users of the class. It
always strikes me as abstraction leak to me.
> TextIOBase class implementations additionally provide the following methods:
> Read until newline or EOF and return the line.
> Returns an iterator that returns lines from the file (which
> happens to be 'self').
> Same as readline()
> Same as readlinesiter()
Note sure why you need "readlinesiter()" at all. I thought Py3k was disposing
most of the "fooiter()" functions (thinking of dicts...).
> Another way to do it is as follows (we should pick one or the other):
> .__init__(self, buffer, encoding=None, newline=None)
I think this is clearer. I can't find a good real-world usecase for requiring
the two parameters version.
Now for some real example. Let's say I'm given a readable RawIOBase object.
I'm told that it's a foobar-compressed utf-8 text-file. I have this API available:
# initialize decompressor
# feed compressed bytes and get uncompressed bytes.
# The uncompressed data can be smaller, equal or larger
# than the compressed data
decompress(bytes) -> bytes
# finish decompression and get tail
flush() -> bytes
This is basically similar to the way zlib.decompress/flush works. I would like
to wrap the readable RawIOBase object in a way that I obtain a textual
file-like with readline() etc.
This is pretty hard to do with the current I/O library (you need to write a
lot of code). It'd be good if the new I/O library makes it easier to achieve.
Let's see. I start with a raw I/O reader:
def __init__(self, raw):
self.raw = raw
self._d = Foobar()
self._buf = bytes()
# I assume RawIOBase.read() must return the
# exact number of bytes (unless at the end).
# I assume RawIOBase.read() raises EOFError when done
# I assume readinto() does not exist...
def read(self, n):
while len(self._buf) < n:
b = self.raw.read(n)
self._buf += self._d.decompress(b)
self._buf += self._d.flush()
d = self._buf[:n]
if not d:
and complete the job:
return TextIOWrapper(BufferedReader(FoobarRaw(raw)), encoding="utf-8")
for L in foobar_open(sock):
Uhm, looks great!
Now, it might be interesting playing with the different semantic of
RawIOBase.read(), which I proposed above, and see how the implementation of
For instance (now being radical): why don't we drop the "n" argument
altogether? We could just define it like this:
# Returns a block of data, whose size is implementation-defined
# and may vary between calls. It never returns a zero-sized block.
# Raises EOFError when done.
read() -> bytes
After all, there's a BufferedIO layer to handle buffering and exact-size
reads/writes. If we go this way, the above example is even easier:
b = self.raw.read() # any size!
b = self._d.flush()
if not b:
It would also work well for sockets, since they would return exactly the
buffer of data arrived from the network, and simply block once if there's not
More information about the Python-3000