[Python-3000] iostack and Oh Oh

Fri Nov 24 16:18:24 CET 2006

recap: we agreed py3k needs a unified, better organized and unbuffered new
io stack. i began working on the subject several months back, and we had
long discussions on what it's design.

you can see where we got at the following page (although not up to date):
http://sebulba.wikispaces.com/project+iostack+v2

i mostly neglected it in the past months, and got back to only now.
the problem is, i got confused with all the current OO/generics/interfaces
talks.

back in the day, we said all stream objects would support read() and
write(), and streams with limited functionality (readonly, writeonly) would
be "queriable" using the following properties:
* writable
* readable
* seekable

i.e.,
>>> f = FileStream("somefile.txt", "r")
>>> f.readable
True
>>> f.writable
False

but in light of the current discussion, maybe we'd prefer better a
java-like approach, separating the stream into readers and writers
over some fundamental "source".

i.e., a source would be a socket, file, StringIO, whatever, over which
we'll put a reader/writer layer. it may work well for uni-directional stream,
but for sockets it causes some overhead:
>>> s = socket.socket(...)
>>> sr = SocketReader(s)
>>> sw = SocketWriter(s)
>>> sr.read(5)
>>> sw.write("hello")

you have to construct both a writer an a reader to work properly with
a socket, and use a different object for different operations...
doesn't feel right to me.

also, it would be a huge headache to keep in sync the buffers of
both reader/writer, when a source is opened for both read and write...

- - - - - - - - -

we had another problem, iirc, namely "seeking-decoding problem":
what is the meaning of seeking in a character-stream over a file-stream?
the underlying file-stream works with bytes, while the text stream works
with characters of mixed widths...

* how do we handle unaligned-seeking?
* should tell() return "opaque cookies", instead of ints, so we could
  seek only to locations that tell() returned?
* still, if we modify the stream, these cookies need to be invalidated...

here's an example:
>>> f = TextAdapter(FileStream("somefile.txt", "w+"), "utf16")
>>> p0 = f.tell() # filepos = 0
>>> f.write("ABC")
>>> p1 = f.tell() # filepos = 3
>>> f.seek(p0)
>>> f.write(u"\uffff") # takes 4 bytes
>>> f.seek(p1)
>>> f.read(1) # in the middle of a character now...

there was a suggestion back in the day, to make streams sequential
by definition (i.e., no seek/tell) -- mostly like java. a StreamReader can
skip() a number of characters (essentially equivalent to reading and
discarding the result)

instead of seeking at the stream level, the "sources" would grow
indexing (getitem/setitem) for that. a seekable source (only files actually)
could be accessed directly:

>>> f = FileSource("somefile.txt", "r")
>>> f[10:20] = "hello worl"

i.e., the source acts like a sequence of bytes.

but of course sources should be unbuffered, which may prove problematic
when we have buffering added by streams on top of the source...  bladt

- - - - - - - - -

any suggestions? i'd guess the interfaces-supporters (Bill?) would prefer
a java-like stack (where subclasses of StreamReader provide reading
and subclasses of StreamWriter provide writing); duck-typing-supporters
and conservatives would probably prefer the succinct queiable approach
(streams support both read and write, but can be checked using the
relevant properties)

we ought to agree on the design here, if this is supposed to replace
the entire io stack... it may also be a good starting place for the supporters
of each camp to demonstrate how their approach "does it better".

tata,
- tomer