[Python-Dev] what Windows and Linux really do Re: PEP 383 (again)
tmbdev at gmail.com
Thu Apr 30 09:21:54 CEST 2009
Given the stated rationale of PEP 383, I was wondering what Windows actually
does. So, I created some ISO8859-15 and ISO8859-8 encoded file names on a
device, plugged them into my Windows Vista machine, and fired up Python 3.0.
First, os.listdir("f:") returns a list of strings for those file names...
but those unicode strings are illegal.
You can't even print them without getting an error from Python. In fact,
you also can't print strings containing the proposed half-surrogate
encodings either: in both cases, the output encoder rejects them with a
UnicodeEncodeError. (If not even Python, with its generally lenient
attitude, can print those things, some other libraries probably will fail,
What about round tripping? So, if you take a malformed file name from an
external device (say, because it was actually encoded iso8859-15 or East
Asian) and write it to an NTFS directory, it seems to write malformed UTF-16
file names. In essence, Windows doesn't really use unicode, it just
implements 16bit raw character strings, just like UNIX historically
implements raw 8bit character strings.
Then I tried the same thing on my Ubuntu 9.04 machine. It turns out that,
unlike Windows, Linux is seems to be moving to consistent use of valid
UTF-8. If you plug in an external device and nothing else is known about
it, it gets mounted with the utf8 option and the kernel actually seems to
enforce UTF-8 encoding. I think this calls into question the rationale
behind PEP 383, and we should first look into what the roadmap for
UNIX/Linux and UTF-8 actually is. UNIX may have consistent unicode support
(via UTF-8) before Windows.
As I was saying, I think PEP 383 needs a lot more thought and research...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev