[Python-3000] Unicode strings, identifiers, and import

Mon May 14 13:32:40 CEST 2007

On Sun, 13 May 2007 22:03:26 -0500, Michael Urman <murman at gmail.com> wrote:
>On 5/13/07, Guido van Rossum <guido at python.org> wrote:
>> The answer to all of this is the filesystem encoding, which is already
>> supported. Doesn't appear particularly difficult to me.
>
>Okay, that's fair. It seems reasonable to accept the limitations of
>following the filesystem encoding for module names. I should probably
>test py3k to make sure it already has updated __import__ to use the
>filesystem encoding instead of the default encoding, but instead I'll
>just feebly imply the question here.

It's harder for this, actually.  Even if you know the encoding, you'll
still run into problems when you don't know the normalization.  Consider
the case where a developer creates a module with a non-ASCII name on OS X
and then distributes it.  There is a fair to strong chance that their
source code will use NFC for the module name.  During development, this
will work just fine, as OS X normalizes all filename access to NFD.  When
someone on another platform attempts to use the module though, they will
mysteriously find that it cannot be found.  Their NFC spelling of the
module name won't find the NFD file in the filesystem, and they will likely
be completely baffled by the failure.

This is, of course, an existing difficulty with dealing with unicode
filenames in Python, but at least the interpreter itself doesn't yet
have to concern itself with it, as no language features require it.
I suspect that if non-ASCII module names are allowed, a lot of people
will be running into this.

Jean-Paul