[Python-3000] Google Sprint Ideas

Mon Aug 21 03:34:28 CEST 2006

Guido van Rossum wrote:
> On 8/20/06, Talin <talin at acm.org> wrote:
>> Guido van Rossum wrote:
>> > On 8/20/06, Paul Moore <p.f.moore at gmail.com> wrote:
>>
>> > Without endorsing every detail of his design, tomer filiba has written
>> > several blog (?) entries about this, the latest being
>> > http://sebulba.wikispaces.com/project+iostack+v2 . You can also look
>> > at sandbox/sio/sio.py in svn.
>>
>> One comment after reading this: If we're going to re-invent the Java/C#
>> i/o library, could we at least use the same terminology? In particular,
>> the term "Layer" has connotations which may be confusing in this context
>> - I would prefer something like "Adapter" or "Filter".
> 
> That's an example of what I meant when I said "without endorsing every 
> detail".
> 
> I don't know which terminology C++ uses beyond streams. I think Java
> uses Streams for the lower-level stuff and Reader/Writer for the
> higher-level stuff -- or is it the other way around?

Well, the situation with Java is kind of complex. There are two sets of 
stream classes, but rather than classifying them as "low-level" and 
"high-level", a better classification is "old" and "new". The old 
classes (InputStream/OutputStream) are byte-oriented, whereas the newer 
ones (Reader/Writer) are character-oriented. It it not the case, 
however, that the character-oriented interface sits on top of the 
byte-oriented interface - rather, both interfaces are implemented by a 
number of different back ends.

For purposes of Python, it probably makes more sense to look at the .Net 
System.IO.Stream. (As a general rule, the .Net classes are refactored 
versions of the Java classes, which is both good and bad. It's best to 
study both if one is looking for inspiration.)

Hmmm, apparently the .Net documentation *does* use the term 'layer' to 
describe one stream wrapping another - which I still find strange. To my 
mind, the term 'layer' can either describe a particular design stratum 
within an architecture - such as the 'device layer' of an operating 
system - or it can describe a portion of a document, such as a drawing 
layer in a CAD program. I don't normally think of a single instance of a 
class wrapping another instance as constituting a "layer" - I usually 
use the term "adapter" or "proxy" to describe that case.

(OK, so I'm pedantic about naming. Now you know why one of my side 
projects is writing an online programmer's thesaurus -- using 
Python/TurboGears of course!)

>> Also, I notice that this proposal removes what I consider to be a nice
>> feature of Python, which is that you can take a plain file object and
>> iterate over the lines of the file -- it would require a separate line
>> buffering adapter to be created. I think I understand the reasoning
>> behind this - in a world with multiple text encodings, the definition of
>> "line" may not be so simple. However, I would assume that the "built-in"
>> streams would support the most basic, least-common-denominator encodings
>> for convenience.
> 
> First time I noticed that. But perhaps it's the concept of "plain file
> object" that changed? My own hierarchy (which I arrived at without
> reading tomer's proposal) is something like this:
> 
> (1) Basic level (implemented in C) -- open, close, read, write, seek,
> tell. Completely unbuffered, maps directly to system calls. Does
> binary I/O only.
> 
> (2) Buffering. Implements the same API as (1) but adds buffering. This
> is what one normally uses for binary file I/O. It builds on (1), but
> can also be built on raw sockets instead. It adds an API to inquire
> about the amount of buffered data, a flush() method, and ways to
> change the buffer size.
> 
> (3) Encoding and line endings. Implements a somewhat different API,
> for reading/writing text files; the API resembles Python 2's I/O
> library more. This is where readline() and next() giving the next line
> are implemented. It also does newline translation to/from the
> platform's native convention (CRLF or LF, or perhaps CR if anyone
> still cares about Mac OS <= 9) and Python's convention (always \n). I
> think I want to put these two features (encoding and line endings) in
> the same layer because they are both text related. Of course you can
> specify ASCII or Latin-1 to effectively disable the encoding part.
> 
> Does this make more sense?

I understood that much -- this is pretty much the way everyone does 
things these days (our own custom stream library at work looks pretty 
much like this too.)

The question I was wondering is, will the built-in 'file' function 
return an object of level 3?

-- Talin