[Python-Dev] str object going in Py3K

Wed Feb 15 18:25:59 CET 2006

On 2/15/06, Fuzzyman <fuzzyman at voidspace.org.uk> wrote:
>  Forcing the programmer to be aware of encodings, also pushes the same
> requirement onto the user (who is often the source of the text in question).

The programmer shouldn't have to be aware of encodings most of the
time -- it's the job of the I/O library to determine the end user's
(as opposed to the language's) default encoding dynamically and act
accordingly. Users who use non-ASCII characters without informing the
OS of their encoding are in a world of pain, *unless* they use the OS
default encoding (which may vary per locale). If the OS can figure out
the default encoding, so can the Python I/O library. Many apps won't
have to go beyond this at all.

Note that I don't want to use this OS/user default encoding as the
default encoding between bytes and strings; once you are reading bytes
you are writing "grown-up" code and you will have to be explicit. It's
only the I/O library that should automatically encode on write and
decode on read.

>  Currently you can read a text file and process it - making sure that any
> changes/requirements only use ascii characters. It therefore doesn't matter
> what 8 bit ascii-superset encoding is used in the original. If you force the
> programmer to specify the encoding in order to read the file, they would
> have to pass that requirement onto their user. Their user is even less
> likely to be encoding aware than the programmer.

I disagree -- the user most likely has set or received a default
encoding when they first got the computer, and that's all they are
using. If other tools (notepad, wordpad, emacs, vi etc.) can figure
out the encoding, so can Python's I/O library.

>  What this means, is that for simple programs where the programmer doesn't
> want to have to worry about encoding, or can't force the user to be aware,
> they will read in the file as bytes.

Of course not!

> Modules will quickly and inevitably be
> created implementing all the 'string methods' for bytes. New programmers
> will gravitate to these and the old mess will continue, but with a more
> awkward hybrid than before. (String manipulations of byte sequences will no
> longer be a core part of the language - and so be harder to use.)

This seems an unlikely development if we do the conversions in the I/O library.

>  Not sure what we can do to obviate this of course... but is this change
> actually going to improve the situation or make it worse ?

I'm not worried about this scenario. "What if all the programmers in
the world suddenly became dumb?"

--
--Guido van Rossum (home page: http://www.python.org/~guido/)