[Python-3000] Comment on iostack library

Fri Sep 1 13:34:42 CEST 2006

"tomer filiba" <tomerfiliba at gmail.com> writes:

>> Encoding conversion and newline conversion should be performed a
>> block at a time, below buffering, so not only I/O syscalls, but
>> also invocations of the recoding machinery are amortized by
>> buffering.
>
> you have a good point, which i also stumbled upon when implementing
> the TextInterface. but how would you suggest to solve it?

I've designed and implemented this for my language, but I'm not sure
that you will like it because it's quite different from the Python
tradition.

The interface of block reading appends data to the end of the supplied
buffer, up to the specified size (or infinity), and also it tells
whether it reached end of data. The interface of block writing removes
data from the beginning of the supplied buffer, up to the supplied
size (or the whole buffer), and is told how to flush, which includes
information whether this is the end of data. Both functions are
allowed to read/write less than requested.

The recoding engine moves data from the beginning of an input buffer
to the end of an output buffer. The block recoding function has
similar size parameters as above, and a flushing parameter. It returns
True on output overflow, i.e. when it stopped because it needs more
room in the output rather than because it needs more input. It leaves
unconverted data at the end of the input buffer if data looks incomplete,
unless it is told that this is the last block - in this case it fails.

Both decoding input streams and encoding output streams have a
persistent buffer in the format corresponding to their low end,
i.e. a byte buffer when this is the boundary between bytes and
characters.

This design allows to plug everything together, including the cases
where recoding changes sizes significantly (compression/decompression).

It also allows reading/writing process to be interrupted without
breaking the consistency of the state of buffers, as long as each
primitive reading/writing operation is atomic, i.e. anything it
removes from the input buffer is converted and put in the output
buffer. Data not yet processed by the remaining layers remains in
their respective buffers.

For example reading a block from a decoding stream:
1. If there was no overflow previously, read more data from the
   underlying stream to the internal buffer, up to the supplied
   maximum size.
2. Decode data from the internal buffer to the supplied output buffer,
   up to the supplied maximum size. Tell the recoding engine that this
   is the last piece if there was no overflow previously and reading
   from the underlying stream reached the end.
3. Return True (i.e. end of input) if there was no overflow now and
   reading from the underlying stream reached the end.

Writing a block to an encoding stream is simpler:
1. Encode data from the supplied input buffer to the internal buffer.
2. Write data from the internal buffer to the output stream.

Buffered streams are typically put on the top of the stack. They
support reading a line at a time, unlimited lookahead and unlimited
unreading, and writing which guarantees that it won't leave anything
in the buffer it is writing from.

Newlines are converted by a separate layer. The buffered stream
assumes "\n" endings.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/