[docs] [issue11230] "Full unicode import system" not in 3.2

Tom Christiansen report at bugs.python.org
Fri Aug 12 04:36:31 CEST 2011

Tom Christiansen <tchrist at perl.com> added the comment:

How does this work for modules that have filesystem names different from the one used for import? The issue I'm thinking about is that the Mac HSF+ filesystem keeps its Unicode filenames in (close to) NFD form. That means that a module named "caf\N{LATIN SMALL LETTER E WITH ACUTE}" with 4 graphemes and 4 code points in its name winds up in the filesystem as "cafe\N{COMBINING ACUTE ACCENT}" still with 4 graphemes but now with 5 code points.

I believe (well, suspect; I have empirical evidence not proof) Python stores its own identifiers in NFD, so this may not be quite as much of a problem as it might otherwise be.  Nonetheless, I have had users complain about what HFS+ does with such filenames, although I am not quite sure why. I think it’s because they access a file with 4 chars but they need a 5-char fileglob to wildcard it, so touch "caf\N{LATIN SMALL LETTER E WITH ACUTE}" and then you need a wildcard of "?????" with an extra ? to find it. Kinda weird.

nosy: +tchrist

Python tracker <report at bugs.python.org>

More information about the docs mailing list