[Python-Dev] Unicode source code
M.-A. Lemburg
mal@lemburg.com
Sun, 09 Feb 2003 17:39:59 +0100
Just van Rossum wrote:
> M.-A. Lemburg wrote:
>
>
>>Just van Rossum wrote:
>>
>>>Now that PEP 263 is in place (yet hotly debated on c.l.py ;-),
>>>wouldn't it be fairly small step to fully support unicode strings
>>>in compile(), eval() and exec? I notice these still attempt to
>>>convert unicode to 8 bit with the default encoding, which isn't
>>>very useful.
>>
>>Patches are most welcome.
>
> Some guidance on where to look is more than welcome.
The tokenizer/compiler works as follows (quote from another
email):
"""
source code using encoding ENC
-> via codec for ENC into Unicode
-> via UTF-8 codec into UTF-8 string
-> tokenizer
-> compiler
for 8-bit string literals in the source code
-> UTF-8 string is converted back into encoding ENC
Provided that the encoding ENC is roundtrip safe
for all 256 base character ordinals, 8-bit strings
will turn out as-is in the compiled byte code.
"""
Now, to accept Unicode it would probably be worthwhile hooking
into this chain at step 2 rather than step 1 (the code for the
tokenizer is in Parser/tokenizer.c, the compiler code in
Python/compiler.c), however, this is difficult because most
APIs for compiling code are built on char* buffers.
A short-term solution would probably be to convert Unicode to
UTF-8 and prepend a UTF-8 BOM mark so that the tokenizer
knows that it is getting UTF-8. Haven't tested this though.
A slightly better solution (on narrow Unicode Python builds)
would be to use UTF-16 for this. The UTF-16 support in the
tokenizer would have to be enabled for this, though. It is
currently disabled for some reason I don't remember. Martin
should know... but he's on vacation.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Software directly from the Source (#1, Feb 09 2003)
>>> Python/Zope Products & Consulting ... http://www.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
Python UK 2003, Oxford: 51 days left
EuroPython 2003, Charleroi, Belgium: 135 days left