[Python-Dev] Unicode strings as filenames
Skip Montanaro
skip@pobox.com (Skip Montanaro)
Thu, 3 Jan 2002 09:11:01 -0600
What's the correct way to deal with filenames in a Unicode environment?=
Consider this:
>>> import site
>>> site.encoding
'latin-1'
>>> a =3D "abc\xe4\xfc\xdf.txt"
>>> u =3D unicode (a, "latin-1")
>>> uu =3D u.encode ("utf-8")
>>> open(a, "w")
<open file 'abc=E4=FC=DF.txt', mode 'w' at 0x823c2a0>
>>> open(u, "w")
<open file 'abc=E4=FC=DF.txt', mode 'w' at 0x823a1e8>
>>> open(uu, "w")
<open file 'abc=C3=A4=C3=BC=C3.txt', mode 'w' at 0x81d6160>
If I change my site's default encoding back to ascii, the second open f=
ails:
>>> import site
>>> site.encoding
'ascii'
>>> a =3D "abc\xe4\xfc\xdf.txt"
>>> u =3D unicode (a, "latin-1")
>>> uu =3D u.encode ("utf-8")
>>> open(a, "w")
<open file 'abc=E4=FC=DF.txt', mode 'w' at 0x822b448>
>>> open(u, "w")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)
>>> open(uu, "w")
<open file 'abc=C3=A4=C3=BC=C3.txt', mode 'w' at 0x822d260>
as I expect it should. The third open is a problem as well, even thoug=
h it
succeeds with either encoding. (Why doesn't it fail when the default
encoding is ascii?) My thought is that before using a plain string or =
a
unicode string as a filename it should first be coerced to a unicode st=
ring
with the default encoding, something like:
if type(fname) =3D=3D types.StringType:
fname =3D unicode(fname, site.encoding)
elif type(fname) =3D=3D types.UnicodeType:
fname =3D fname.encode(site.encoding)
else:
raise TypeError, ("unrecognized type for filename: %s"%type(fna=
me))
Is that the correct approach? Apparently Python's file object doesn't =
do
this under the covers. Should it?
Thx,
Skip