[Python-Dev] PEP 277 (unicode filenames): please review
Fredrik Lundh
fredrik@pythonware.com
Tue, 13 Aug 2002 14:17:54 +0200
jack wrote:
> Sigh. \u0308 is not in the range(256), but the whole point of=20
> encode('latin-1') is to make it so, isn't it?
Define "make it so"?
The encoders convert unicode code points to corresponding code
points in the given 8-bit encoding. One character in, one character
out (unless the target encoding is a multibyte encoding, like utf-8).
This works perfectly well if producers follow the "early uniform
normalization" rule (everything else is madness). For some reason,
your listdir implementation doesn't.
Instead of returning LATIN SMALL LETTER O WITH DIARESIS (\u00f6),
it returns multiple unicode characters. I'd say it's broken.
As far as I know, there's no standard unicode normalizer in Python.
</F>