[Python-3000] encoding hell

tomer filiba tomerfiliba at gmail.com
Sun Sep 3 20:17:39 CEST 2006


> FileReader would be an InputStream,
> FileWriter would be an OutputStream

yes, this has been discussed, but that's too java-ish by nature.
besides, how would this model handle a simple operation, such as
file("foo", "w+") ?

opening TWO file descriptors for that purpose, one for reading and
another for writing, is a complete waste of resources: handles are not
cheap. not to mention that opening the same file multiple times may
run you into platform-specific pits, like read-after-write bugs, etc.

so the obvious solution is having an underlying "file-like object",
which is basically like today's file (supports read() AND write()),
over which InputStream and OutputStream just expose a different
view of:

f = file(...)
fr = FileReader(f)
fw = FileWriter(f)
fr.read()
fw.write()

now, this means you start with a "capable" object like file, with all of
the desired operations, and you intentionally CRIPPLE it down into
separate reading and writing front-ends.

so what's sense does that make? if you want an InputStream, just be
sure you only call read() or readall(); if you want an OutputStream
limit yourself to caling write(). input-only/output-only streams are
just silly and artificial overhead -- we don't need them.

the java/.NET world relies on interfaces so much that it might make
sense in that context. but that's not the python way.

> no sooner do you introduce the
> key methods read and write, than you supplement them with capability
> queries readable and writable that check whether these methods may
> even be called. IMO this is a clear indication that these methods
> really want to be refactored into separate classes.

the reason is some streams, like pipes or partially shutdown()ed-
sockets may be unidirectional; some (i.e., sockets) may not support
seeking -- but the 2nd layer may augment that. for example, the
BufferingLayer may add seeking (it already supports unreading).

that's why streams are queriable -- iostack has a layered structure
that allows each layer to add more functionality to the underlying
layer. in other words, all stream are NOT born equal, but they can
be made equal later :)

that way, when your function accepts a stream as an argument,
it would just check s.readable or s.seekable, without regard to the
*type* of s itself, or the underlying storage --

it may be a file, it may be a buffered socket, but as long as you can
read from it/seek in it,  your code would work just fine. kinda like
duck-typing.

> FileBytes would support the
> sequence protocol, mimicking bytes objects.  It would support
> random-access read and write using __getitem__ and __setitem__,
> allowing slice assignment for slices of equal size.

this may be a good direction. i'll try to see how it fits in.


-tomer

On 9/3/06, Anders J. Munch <2006 at jmunch.dk> wrote:
> tomer filiba wrote:
>  > my solution would be completely leaving seek() and tell() out of the
>  > 3rd layer -- it's a byte-level operation.
>  >
>  > anyone thinks differently? if so, what's your solution?
>
> seek and tell are a poor mans sequence.  I would have nothing by those
> names.
>
> I would have input streams, output streams and sequences, and I
> wouldn't mix the three.  FileReader would be an InputStream,
> FileWriter would be an OutputStream.  FileBytes would support the
> sequence protocol, mimicking bytes objects.  It would support
> random-access read and write using __getitem__ and __setitem__,
> allowing slice assignment for slices of equal size.  And there would
> be append() to extend the file, and partial __delitem__ support for
> truncating.
>
> Looking at your iostack2 Stream class, no sooner do you introduce the
> key methods read and write, than you supplement them with capability
> queries readable and writable that check whether these methods may
> even be called.  IMO this is a clear indication that these methods
> really want to be refactored into separate classes.
>
> I think you'll find that separating input, output and random access
> into three separate ADTs will much simplify BufferingLayer (even
> though you'll need three of them).  At least if you intend to take
> interactions between reads and writes into account.
>
> regards,
> Anders
>
>


More information about the Python-3000 mailing list