[Python-3000] Unicode strings, identifiers, and import

Guido van Rossum guido at python.org
Mon May 14 17:25:02 CEST 2007

On 5/14/07, Jean-Paul Calderone <exarkun at divmod.com> wrote:
> On Sun, 13 May 2007 22:03:26 -0500, Michael Urman <murman at gmail.com> wrote:
> >On 5/13/07, Guido van Rossum <guido at python.org> wrote:
> >> The answer to all of this is the filesystem encoding, which is already
> >> supported. Doesn't appear particularly difficult to me.
> >
> >Okay, that's fair. It seems reasonable to accept the limitations of
> >following the filesystem encoding for module names. I should probably
> >test py3k to make sure it already has updated __import__ to use the
> >filesystem encoding instead of the default encoding, but instead I'll
> >just feebly imply the question here.
> It's harder for this, actually.  Even if you know the encoding, you'll
> still run into problems when you don't know the normalization.  Consider
> the case where a developer creates a module with a non-ASCII name on OS X
> and then distributes it.  There is a fair to strong chance that their
> source code will use NFC for the module name.  During development, this
> will work just fine, as OS X normalizes all filename access to NFD.  When
> someone on another platform attempts to use the module though, they will
> mysteriously find that it cannot be found.  Their NFC spelling of the
> module name won't find the NFD file in the filesystem, and they will likely
> be completely baffled by the failure.
> This is, of course, an existing difficulty with dealing with unicode
> filenames in Python, but at least the interpreter itself doesn't yet
> have to concern itself with it, as no language features require it.
> I suspect that if non-ASCII module names are allowed, a lot of people
> will be running into this.

Isn't normalization also going to be an issue with using non-ASCII in
general? Does it mean that Python will have to use a normalization
before comparing identifiers as equal? That's terrible, as it will
vastly increase the amount needed to hash a string, too.

--Guido van Rossum (home page: http://www.python.org/~guido/)

More information about the Python-3000 mailing list