[Python-Dev] Unicode strings as filenames

Skip Montanaro skip@pobox.com (Skip Montanaro)
Thu, 3 Jan 2002 09:11:01 -0600


What's the correct way to deal with filenames in a Unicode environment?=

Consider this:

    >>> import site
    >>> site.encoding
    'latin-1'
    >>> a =3D "abc\xe4\xfc\xdf.txt"
    >>> u =3D unicode (a, "latin-1")
    >>> uu =3D u.encode ("utf-8")
    >>> open(a, "w")
    <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x823c2a0>
    >>> open(u, "w")
    <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x823a1e8>
    >>> open(uu, "w")
    <open file 'abc=C3=A4=C3=BC=C3.txt', mode 'w' at 0x81d6160>

If I change my site's default encoding back to ascii, the second open f=
ails:

    >>> import site
    >>> site.encoding
    'ascii'
    >>> a =3D "abc\xe4\xfc\xdf.txt"
    >>> u =3D unicode (a, "latin-1")
    >>> uu =3D u.encode ("utf-8")
    >>> open(a, "w")
    <open file 'abc=E4=FC=DF.txt', mode 'w' at 0x822b448>
    >>> open(u, "w")
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    UnicodeError: ASCII encoding error: ordinal not in range(128)
    >>> open(uu, "w")
    <open file 'abc=C3=A4=C3=BC=C3.txt', mode 'w' at 0x822d260>

as I expect it should.  The third open is a problem as well, even thoug=
h it
succeeds with either encoding.  (Why doesn't it fail when the default
encoding is ascii?)  My thought is that before using a plain string or =
a
unicode string as a filename it should first be coerced to a unicode st=
ring
with the default encoding, something like:

    if type(fname) =3D=3D types.StringType:
        fname =3D unicode(fname, site.encoding)
    elif type(fname) =3D=3D types.UnicodeType:
        fname =3D fname.encode(site.encoding)
    else:
        raise TypeError, ("unrecognized type for filename: %s"%type(fna=
me))

Is that the correct approach?  Apparently Python's file object doesn't =
do
this under the covers.  Should it?

Thx,

Skip