[Python-3000] Unicode strings, identifiers, and import

Michael Urman murman at gmail.com
Sun May 13 21:04:48 CEST 2007


This occurred to me while reading the PEP 3131 discussion, and while
it's not limited to PEP 3131 concerns, I don't believe I've seen
discussed yet elsewhere. What is the interaction between import or
__import__ and Unicode module names (or at least Unicode strings
describing them). Currently in python 2.5, __import__ appears coerce
to str, leading to the following error case:

>>> __import__(unicodedata.lookup('GREEK SMALL LETTER EPSILON'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u03b5' in
position 0: ordinal not in range(128)

With str being the Unicode type in py3k, this branch of the potential
problem needs to be addressed clearly, whether by defining __import__
as converting through ASCII, or by defining a useful semantic. If PEP
3131 is to be accepted, then it should probably address whether import
will work on non-ASCII identifiers, and if so what the semantics are
(if __import__ would otherwise limit to ASCII).

I'm a little worried on the implementation side, because while on
Windows it should be easy to use unicode file APIs, on Linux the
filenames may or may be UTF-8 friendly.

Michael
-- 
Michael Urman


More information about the Python-3000 mailing list