[Python-Dev] RE: Defining Unicode Literal Encodings
Sat, 14 Jul 2001 13:45:10 +0200
Paul Prescod wrote:
> "M.-A. Lemburg" wrote:
> > ....
> > Please don't mix 8-bit strings with Unicode literals: 8-bit
> > strings don't carry any encoding information, so providing encoding
> > information cannot be stored anywhere.
> First, we could store the information if we want.
> Second, whether we choose to store the information or not, the point is
> that the source file should not mix encodings.
I have added a new paragraph to the PEP (see my rev. 1.1 posting)
pointing out that it is the programmers responsability to choose
reasonable encodings; in particular, the used encodings should be
compatible so that a text editor can display the data correctly.
> > Comments, OTOH, are part of the program text, so they have to be ASCII
> > just like the Python source itself.
> The Python interpreter allows non-ASCII characters in comments.
> > Hmm, good point, but hard to implement. We'd probably need a two
> > phase decoding for this to work:
> > 1. decode the given Unicode literal encoding
> > 2. decode any Unicode escapes in the Unicode string
> That doesn't sound so hard. :)
True. The issue here is very similar to standard literals
vs. raw ones. Perhaps step 2 should only be imposed on standard
literals while raw ones stop after step 1.
> > I think that allowing one directive per file is the way to go,
> > but I'm not sure about the exact position. Basically, I think it
> > should go "near" the top, but not necessarily before any doc-string
> > in the file.
> If Guido is violently opposed to having it before the docstring then we
> could allow it either before or after the docstring to give tools time
> to catch up.
> I'm not sure what tools in particular have the problem, though. Any tool
> that uses introspection or inspect.py will be fine.
See my other posting for ways to work around this problem.
CEO eGenix.com Software GmbH
Consulting & Company: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/