[Python-Dev] PEP 263 - Defining Python Source Code Encodings
M.-A. Lemburg
mal@lemburg.com
Sun, 14 Jul 2002 18:21:34 +0200
Martin v. Loewis wrote:
> "Fredrik Lundh" <fredrik@pythonware.com> writes:
>
>
>>hmm. I'm tempted to think that there's a major
>>flaw in the PEP, caused by the fact that
>>
>> compile(unicode(script, extract_encoding(script)))
>>
>>will, from what I can tell, not compile to the same
>>thing as:
>>
>> compile(script)
>
>
> Can you elaborate what you think the difference is? I believe the PEP
> is silent on this specific aspect,
It does mention this as part of phase 2.
> but I think what should happen is
> (in the Unicode case):
>
> - compile will convert the script to UTF-8, which is then tokenized.
> - in the process of parsing, the encoding declaration (that presumably
> extract_encoding was looking at as well) is recognized, if any.
> - Unicode literals are left as-is; byte string literals are converted
> back to the original encoding.
Right.
> So if there is an encoding declaration in script, then I cannot see a
> difference. If there is none, the PEP does not elaborate what should
> happen. Leaving the byte strings as UTF-8 seems safest, since the only
> way to get "correct" non-ASCII strings without the encoding comment is
> to use the UTF-8 signature.
>
> In any case, this can't cause backwards compatibility
> problems. compile accepts Unicode strings today only if they can be
> converted to a byte string. In the standard installation, this will
> fail today if there is non-ASCII in script. So allowing Unicode in
> compile is a pure extension. If its precise meaning is underspecified,
> it should be clarified before stage 2 is implemented.
No need for this. The PEP already mentions it.
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/