[Python-Dev] Re: Unicode debate

Just van Rossum just@letterror.com
Fri, 28 Apr 2000 09:33:16 +0100


At 11:01 AM -0400 27-04-2000, Guido van Rossum wrote:
>Where does the current approach require work?
>
>- We need a way to indicate the encoding of Python source code.
>(Probably a "magic comment".)

How will other parts of a program know which encoding was used for
non-unicode string literals?

It seems to me that an encoding attribute for 8-bit strings solves this
nicely. The attribute should only be set automatically if the encoding of
the source file was specified or when the string has been encoded from a
unicode string. The attribute should *only* be used when converting to
unicode. (Hm, it could even be used when calling unicode() without the
encoding argument.) It should *not* be used when comparing (or adding,
etc.) 8-bit strings to each other, since they still may contain binary
goop, even in a source file with a specified encoding!

>- We need a way to indicate the encoding of input and output data
>files, and we need shortcuts to set the encoding of stdin, stdout and
>stderr (and maybe all files opened without an explicit encoding).

Can you open a file *with* an explicit encoding?

Just