[Python-Dev] PEP 277 (unicode filenames): please review

Fredrik Lundh fredrik@pythonware.com
Tue, 13 Aug 2002 14:17:54 +0200

jack wrote:

> Sigh. \u0308 is not in the range(256), but the whole point of=20
> encode('latin-1') is to make it so, isn't it?

Define "make it so"?

The encoders convert unicode code points to corresponding code
points in the given 8-bit encoding.  One character in, one character
out (unless the target encoding is a multibyte encoding, like utf-8).

This works perfectly well if producers follow the "early uniform
normalization" rule (everything else is madness).  For some reason,
your listdir implementation doesn't.

Instead of returning LATIN SMALL LETTER O WITH DIARESIS (\u00f6),
it returns multiple unicode characters.  I'd say it's broken.

As far as I know, there's no standard unicode normalizer in Python.