[Python-Dev] PEP 277 (unicode filenames): please review
Jack Jansen
Jack.Jansen@cwi.nl
Tue, 13 Aug 2002 13:32:33 +0200
I was going to suggest that if we return mixed sets of unicode/string=20
values from listdir() we could also do the same thing for platforms=20
where FileSystemDefaultEncoding is utf-8, such as MacOSX.
But as usual with unicode, when I actually try this it doesn't work, and=20=
I don't understand why not. Why is unicode always something that seems=20=
so simple and logical until you actually try it??!?!?
Here's a transcript of my Python session. The terminal has been set to=20=
render in latin-1. The directory contains one file, "fr=F6r"=20
(fr-o-umlaut-r).
sap!jack- python
Python 2.3a0 (#32, Aug 12 2002, 15:31:25)
[GCC 2.95.2 19991024 (release)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir('.')
['fro\xcc\x88r']
>>> utf8name =3D os.listdir('.')[0]
>>> unicodename =3D utf8name.decode('utf-8')
>>> unicodename
u'fro\u0308r'
>>> print unicodename.encode('latin-1')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: Latin-1 encoding error: ordinal not in range(256)
>>>
Sigh. \u0308 is not in the range(256), but the whole point of=20
encode('latin-1') is to make it so, isn't it? And o-umlaut definitely=20
has a latin-1 encoding. I tried the same with macroman in stead of=20
latin-1 (just to make sure this wasn't a bug in the latin-1 encoder),=20
but still no go.
What am I doing wrong?
--
- Jack Jansen <Jack.Jansen@oratrix.com> =20
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma=20
Goldman -