[Python-Dev] what Windows and Linux really do Re: PEP 383 (again)

"Martin v. Löwis" martin at v.loewis.de
Thu Apr 30 10:21:39 CEST 2009

Thomas Breuel wrote:
> Given the stated rationale of PEP 383, I was wondering what Windows
> actually does.  So, I created some ISO8859-15 and ISO8859-8 encoded file
> names on a device, plugged them into my Windows Vista machine, and fired
> up Python 3.0.

How did you do that, and what were the specific names that you
had chosen? How does explorer display the file names?

> First, os.listdir("f:") returns a list of strings for those file
> names... but those unicode strings are illegal.

What was the exact result that you got?

> You can't even print them without getting an error from Python.

This is unrelated to the PEP. Try to run the same code in IDLE,
or use the ascii() function.

> What about round tripping? So, if you take a malformed file name from an
> external device (say, because it was actually encoded iso8859-15 or East
> Asian) and write it to an NTFS directory, it seems to write malformed
> UTF-16 file names.  In essence, Windows doesn't really use unicode, it
> just implements 16bit raw character strings, just like UNIX historically
> implements raw 8bit character strings.

I think you misinterpreted what you saw. To find out what way you
misinterpreted it, we would have to know what it is that you saw.

> I think this calls into
> question the rationale behind PEP 383, and we should first look into
> what the roadmap for UNIX/Linux and UTF-8 actually is.  UNIX may have
> consistent unicode support (via UTF-8) before Windows.

If so, PEP 383 won't hurt. If you never get decode errors for file
names, you can just ignore PEP 383. It's only for those of us who do
get decode errors.


More information about the Python-Dev mailing list