[Python-Dev] PEP 263 - Defining Python Source Code Encodings

M.-A. Lemburg mal@lemburg.com
Sat, 13 Jul 2002 22:25:23 +0200


Fredrik Lundh wrote:
> guido wrote:
> 
> 
> 
>>There's a full implementation for PEP 263.  Martin von Loewis is ready
>>to commit it.  It's of course possible to let him do this and deal
>>with the consequences once they're in CVS, I'd like to see if there's
>>anyone who'd like to review the code before it goes in.  The patch is
>>at http://python.org/sf/534304.  I like the PEP fine, I just don't
>>have time to review the patch
> 
> 
> hmm.  I'm tempted to think that there's a major
> flaw in the PEP, caused by the fact that
> 
>     compile(unicode(script, extract_encoding(script)))
> 
> will, from what I can tell, not compile to the same
> thing as:
> 
>     compile(script)
> 
> but I've had too many holy [gr]ails [1] tonight to
> be sure if that's really a flaw at all...

Right.

The implementation is not a full implementation
of what is defined as step 2 in the PEP. However, I
don't think that we're that far away from that: all that's
needed is to encode a Unicode argument to compiler()
to UTF-8 and then either prepend it with a BOM mark or
a coding spec before passing it to the compiler.

Nice would be to add a new tokenizer API which treats
the input as UTF-8 without looking for the coding
comment or BOM at all.

BTW, the approach mentioned in that PEP is no longer needed
(converting the complete tokenizer to using Py_UNICODE
internally).

I think that the only way to give this code enough testing
is by letting Martin check it in and see what happens. Except
for the few XXX and CAUTION marks, the code looks OK.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/