[Python-3000] Thoughts on dictionary views

Wed Feb 21 03:10:37 CET 2007

"Guido van Rossum" <guido at python.org> wrote:
> On 2/20/07, Paul Moore <p.f.moore at gmail.com> wrote:
> > (I have similar concerns over the "new IO" proposals I've
> > seen, but there's nothing concrete there yet, so I'll save that
> > argument for another day...)
> 
> Then you should also have misgivings about the Unicode/str
> unification. If you are cool with that, I don't see how we can avoid
> redoing the I/O library.

I'm not so sure.  The return type on socket.recv and os.read could be
changed to bytes (seemingly without much difficulty), and likely could
even be changed to *take* a bytes object as the destination buffer
(ditto for files opened as 'raw').  From there, aside from updating the
standard library to handle socket, os.read, etc., for incoming data
expecting a bytes object, and raising an exception when trying to write
a unicode object, that is the limit to the changes.

Of course, even with the proposed updated I/O library, every one of
those modules would have to be changed anyways.

Then again, I've been "eh?" on the whole I/O library thing, and
generally annoyed at the "everything is unicode" idea.  Converting all
libraries that currently deal with IO is going to be a pain, especially
if it does any sort of parsing of mixed binary and non-unicode textual
data (like http headers combined with binary posted data or a utf-8
encoded stream).

As a heavy user of quite a few of the current standard library IO
modules (SocketServer, asyncore, urllib, socket, etc.) and as someone
who has the "opportunity" to write line-level protocols, I'd be quite
happy with the following...

1) add bytes (or add features to array)
2) rename unicode to text (or str)
3) renaming str to bin (or some other sufficiently clear name)
4) making string literals 'hello' be unicode
5) allow for b'constant' be the renamed str
6) add a mandatory 3rd argument to file/open which is the codec to use
for reading
7) offer a new function for opening 'binary' files (which are opened as
'rb' or 'wb' whenever 'r' or 'w' are passed, respectively), which will
remove confusion on Windows platforms

Indeed, it isn't as revolutionary as "everything is unicode", but it
would allow the standard library to be updated with a relative minimum
of fuss and muss, without needing to intermix...
    x = bytes.decode('latin-1').USEFUL_UNICODE_METHOD(...)
or
    sock.send(unicode.encode('latin-1'))

 - Josiah