Unicode program representation

Moshe Zadka moshez at math.huji.ac.il
Mon Apr 3 18:15:53 EDT 2000


On Mon, 3 Apr 2000, Fredrik Lundh wrote:

> hmm.  wouldn't that mean that we end up using different encodings
> in different parts of the script?  feels a little scary, to say the least...

FWIW, I believe that Python's (official) parser should be strictly UTF-8.
It could be done in stages:
 -- stage 1: warn about non-ASCII characters in scripts (warning framework
     TBD)
 -- stage 2: don't accept non-ASCII characters in scripts at all
 -- stage 3: assume all scripts are UTF-8

This isn't a "socket.connect"-like "I've been using this feature for
years" issue: it's trivial to write a script to convert between anything
and UTF-8 (in Python, of course <wink>).

However, there are some non-trivial issues here: should Python identifiers
be able to include all characters Unicode defines as letters?
--
Moshe Zadka <mzadka at geocities.com>. 
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com





More information about the Python-list mailing list