[Python-Dev] PEP 383 (again)

Lino Mastrodomenico l.mastrodomenico at gmail.com
Tue Apr 28 14:29:19 CEST 2009

2009/4/28 Thomas Breuel <tmbdev at gmail.com>:
> If we follow PEP 383, you will get lots of errors anyway because those
> strings, when encoded in utf-8b, will result in an error when you try to
> write them on a Windows file system or any other system that doesn't allow
> the byte sequences that the utf-8b encodes.

I'm not sure if when you say "write them on a Windows FS" you mean
from within Windows itself or a filesystem mounted on another OS, so
I'll cover both cases.

Let's suppose that I use Python 2.x or something else to create a file
with name b'\xff'. My (Linux) system has a sane configuration and the
filesystem encoding is UTF-8, so it's an invalid name but the kernel
will blindly accept it anyway.

With this PEP, Python 3.1 listdir() will convert b'\xff' to the string '\udcff'.

Now if this string somehow ends up in a Python 3.1 program running on
Windows and it tries to create a file with this name, it will work (no
exception will be raised). The Windows GUI will display the standard
"invalid character" symbol (an empty box) when listing this file, but
this seems reasonable since the original file was displayed as "?" by
the Linux console and with another invalid character symbol by the
GNOME file manager.

OTOH if I write the same file on a Windows filesystem mounted on
another OS, there will be in place an automatic translation (probably
done by the OS kernel) from the user-visible filesystem encoding (see
e.g. the "iocharset" or "utf8" mount options for vfat on Linux) to
UTF-16. Which means that the write will fail with something like:

IOError: [Errno 22] invalid filename: b'/media/windows_disk/\xff'

(The "problem" is that a vfat filesystem mounted with the "utf8"
option on Linux will only accept byte sequences that are valid UTF-8,
or at least reasonably similar: e.g. b'\xed\xb3\xbf' is accepted.)

Again this seems reasonable since it already happens in Python 2 and
with pretty much any other software, including GNU cp.

I don't see how Martin can do better than this.

Well, ok, I guess he could break into my house and rename the original
file to something sane...

Lino Mastrodomenico

More information about the Python-Dev mailing list