[Python-Dev] PEP 277 (unicode filenames): please review
Guido van Rossum
guido@python.org
Tue, 13 Aug 2002 09:01:25 -0400
> Here's a transcript of my Python session. The terminal has been set to
> render in latin-1. The directory contains one file, "frör"
> (fr-o-umlaut-r).
> sap!jack- python
> Python 2.3a0 (#32, Aug 12 2002, 15:31:25)
> [GCC 2.95.2 19991024 (release)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import os
> >>> os.listdir('.')
> ['fro\xcc\x88r']
> >>> utf8name = os.listdir('.')[0]
> >>> unicodename = utf8name.decode('utf-8')
> >>> unicodename
> u'fro\u0308r'
> >>> print unicodename.encode('latin-1')
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeError: Latin-1 encoding error: ordinal not in range(256)
> >>>
>
> Sigh. \u0308 is not in the range(256), but the whole point of
> encode('latin-1') is to make it so, isn't it? And o-umlaut definitely
> has a latin-1 encoding. I tried the same with macroman in stead of
> latin-1 (just to make sure this wasn't a bug in the latin-1 encoder),
> but still no go.
>
> What am I doing wrong?
Looks like it isn't you: the filename somehow contains a character
that's not in the Latin-1 subset of Unicode, and no encoding can fix
that for you. I don't know why -- you'll have to figure out why your
keyboard generates that character when you type o-umlaut.
--Guido van Rossum (home page: http://www.python.org/~guido/)