[Python-3000] iostack and sock2

Sun Jun 4 05:52:19 CEST 2006

tomer filiba wrote:
> hi all
> 
> some time ago i wrote this huge post about stackable IO and the
> need for a new socket module. i've made some progress with
> those, and i'd like to receive feedback.
> 
> * a working alpha version of the new socket module (sock2) is
> available for testing and tweaking with at
> http://sebulba.wikispaces.com/project+sock2
> 
> * i'm working on a version of iostack... but i don't expect to make
> a public release until mid july. in the meanwhile, i started a wiki
> page on my site for it (motivation, plans, design):
> http://sebulba.wikispaces.com/project+iostack

Nice, very nice.

Some things that don't appear to have been considered in the iostack design yet:
  - non-blocking IO and timeouts (e.g. on NetworkStreams)
  - interaction with (replacement of?) the select module

Some other random thoughts about the current writeup:

The design appears to implicitly assume that it is best to treat all streams 
as IO streams, and raise an exception if an output operation is accessed on an 
input-only stream (or vice versa). This seems like a reasonable idea to me, 
but it should be mentioned explicitly (e.g an alternative approach would be to 
define InputStream and OutputStream, and then have an IOStream that inherited 
from both of them).

The common Stream API should include a flush() write method, so that 
application code doesn't need to care whether or not it is dealing with 
buffered IO when forcing output to be displayed.

Any operations that may touch the filesystem or network shouldn't be 
properties - attribute access should never raise IOError (this is a guideline 
that came out of the Path discussion). (e.g. the 'position' property is 
probably a bad idea, because x.position may then raise an IOError)

The stream layer hierarchy needs to be limited to layers that both expose and 
use the normal bytes-based Stream API. A separate stream interface concept is 
needed for something that can be used by the application, but cannot have 
other layers stacked on top of it. Additionally, any "bytes-in-bytes-out" 
transformation operation can be handled as a single codec layer that accepts 
an encoding function and a decoding function. This can then be used for 
compression layers, encryption layers, Golay encoding, A-law companding, AV 
codecs, etc. . .

   StreamLayer
     * ForwardingLayer - forwards all data written or read to another stream
     * BufferingLayer - buffers data using given buffer size
     * CodecLayer - encodes data written, decodes data read

   StreamInterface
     * TextInterface - text oriented interface to a stream
     * BytesInterface - byte oriented interface to a stream
     * RecordInterface - record (struct) oriented interface to a stream
     * ObjectInterface - object (pickle) oriented interface to a stream

The key point about the stream interfaces is that while they will provide a 
common mechanism for getting at the underlying stream, their interfaces are 
otherwise unconstrained. The BytesInterface differs from a normal low-level 
stream primarily in the fact that it *is* line-iterable.

On the topic of line buffering, the Python 2.x IO stack treats binary files as 
line iterable, using '\n' as a line separator (well, more strictly it's a 
record separator, since we're talking about binary files).

There's actually an RFE on SF somewhere about making the record separator 
configurable in the 2.x IO stack (I raised the tracker item ages ago when 
someone else made the suggestion).

However, the streams produced by iostack's 'file' helper are not currently 
line-iterable. Additionally, the 'textfile' helper tries to handle line 
terminators while the data is still bytes, while Unicode defines line endings 
in terms of characters. As I understand it, "\x0A" (CR), "\x0D" (LF), 
"\x0A\x0D" (CRLF), "\x85" (NEL), "\x0C" (FF), "\u2028" (LS), "\u2029" (PS) 
should all be treated as line terminators as far as Unicode is concerned.

So I think line buffering and making things line iterable should be left to 
the TextInterface and BytesInterface layers. TextInterface would be most 
similar to the currently file interface, only working on Unicode strings 
instead of 8-bit strings (as well as using the Unicode definition of what 
constitutes a line ending). BytesInterface would work with binary files, 
returning a bytes object for each record.

So I'd tweak the helper functions to look like:

def file(filename, mode = "r", bufsize = -1, line_sep="\n"):
     f = FileStream(filename, mode)
     # a bufsize of 0 or None means unbuffered
     if bufsize:
         f = BufferingLayer(f, bufsize)
     # Use bytes interface to make file line-iterable
     return BytesInterface(f, line_sep)

def textfile(filename, mode = "r", bufsize = -1, encoding = None):
     f = FileStream(filename, mode)
     # a bufsize of 0 or None means unbuffered
     if bufsize:
         f = BufferingLayer(f, bufsize)
     # Text interface deals with line terminators correctly
     return TextInterface(f, encoding)

> with lots of pretty-formatted info. i remember people saying
> that stating `read(n)` returns exactly `n` bytes is problematic,
> can you elaborate?

I can see that behaviour being seriously annoying when you get to the end of 
the stream. I'd far prefer for the stream to just give me the last bit when I 
ask for it and then tell me *next* time that there isn't anything left. This 
has worked well for a long time with the existing read method of file objects. 
If you want a method with the other behaviour, add a "readexact" API, rather 
than changing the semantics of "read" (although I'd be really curious to hear 
the use case for the other behaviour).

(Take a look at the s3.recv(100) line in your Sock2 example - how irritating 
would it be for that to raise EOFError because you only got a few bytes?)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org