[Python-Dev] RE: Defining Unicode Literal Encodings

Paul Prescod paulp@ActiveState.com
Fri, 13 Jul 2001 15:46:02 -0700

"M.-A. Lemburg" wrote:
> ....
> Please don't mix 8-bit strings with Unicode literals: 8-bit
> strings don't carry any encoding information, so providing encoding
> information cannot be stored anywhere.

First, we could store the information if we want.

Second, whether we choose to store the information or not, the point is
that the source file should not mix encodings.

> Comments, OTOH, are part of the program text, so they have to be ASCII
> just like the Python source itself.

The Python interpreter allows non-ASCII characters in comments.

> Hmm, good point, but hard to implement. We'd probably need a two
> phase decoding for this to work:
> 1. decode the given Unicode literal encoding
> 2. decode any Unicode escapes in the Unicode string

That doesn't sound so hard. :)

> I think that allowing one directive per file is the way to go,
> but I'm not sure about the exact position. Basically, I think it
> should go "near" the top, but not necessarily before any doc-string
> in the file.

If Guido is violently opposed to having it before the docstring then we
could allow it either before or after the docstring to give tools time
to catch up.

I'm not sure what tools in particular have the problem, though. Any tool
that uses introspection or inspect.py will be fine.

Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook