[Python-Dev] a suggestion ... Re: PEP 383 (again)
tmbdev at gmail.com
Thu Apr 30 05:16:20 CEST 2009
> The whole purpose of PEP 383 is to send the exact same bytes that were
> read from the OS back to the OS => violating (2) (for whatever the
> apparent system file-encoding is, not limited to UTF-8),
It's fine to read a file name from a file system and write the same file
back as the same raw byte sequence. That I don't have a problem with; it's
not quite right, but it's harmless.
The problem with this PEP is that the malformed unicode it produces can end
up in so many other places: as file names on another file system, in string
processing libraries, in text files, in databases, in user interfaces,
etc. Some of those destinations will use the utf-8b decoder, so they will
get byte sequences that never could occur before and that are illegal under
Nobody knows what will happen. And, yes, Martin is proposing that this is
the default behavior.
There are several other issues that are unresolved: utf-8b makes some
current practices illegal; for example, it might break CESU-8 encodings.
Also, what are Jython and IronPython supposed to do on UNIX? Can they
implement these semantics at all?
> and that has overwhelmingly popular support.
I think people don't fully understand the tradeoffs. I certainly don't.
Although there is a slight benefit, there are unknown and potentially large
costs. We'd be changing Python's entire unicode string behavior for the sake
of one use cases. Since our uses of Python actually involve a lot of
unicode, I am wary of having malformed unicode crop up legally in Python
And that's why I think this proposal should be shelved for a while until
people have had more time to try to understand the issues and also come up
with alternative proposals. Once this is adopted and implemented in
C-Python, Python is stuck with it forever.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-Dev