[Python-Dev] PEP 277 (unicode filenames): please review

Guido van Rossum guido@python.org
Tue, 13 Aug 2002 09:01:25 -0400


> Here's a transcript of my Python session. The terminal has been set to 
> render in latin-1. The directory contains one file, "frör" 
> (fr-o-umlaut-r).
> sap!jack- python
> Python 2.3a0 (#32, Aug 12 2002, 15:31:25)
> [GCC 2.95.2 19991024 (release)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> import os
>  >>> os.listdir('.')
> ['fro\xcc\x88r']
>  >>> utf8name = os.listdir('.')[0]
>  >>> unicodename = utf8name.decode('utf-8')
>  >>> unicodename
> u'fro\u0308r'
>  >>> print unicodename.encode('latin-1')
> Traceback (most recent call last):
>    File "<stdin>", line 1, in ?
> UnicodeError: Latin-1 encoding error: ordinal not in range(256)
>  >>>
> 
> Sigh. \u0308 is not in the range(256), but the whole point of 
> encode('latin-1') is to make it so, isn't it? And o-umlaut definitely 
> has a latin-1 encoding. I tried the same with macroman in stead of 
> latin-1 (just to make sure this wasn't a bug in the latin-1 encoder), 
> but still no go.
> 
> What am I doing wrong?

Looks like it isn't you: the filename somehow contains a character
that's not in the Latin-1 subset of Unicode, and no encoding can fix
that for you.  I don't know why -- you'll have to figure out why your
keyboard generates that character when you type o-umlaut.

--Guido van Rossum (home page: http://www.python.org/~guido/)