[Python-Dev] PEP 383 (again)

Thomas Breuel tmbdev at gmail.com
Wed Apr 29 00:30:42 CEST 2009

> On Windows, the Wide APIs are already used throughout the code base,
>  e.g. SetEnvironmentVariableW/_wenviron. If you need to find out the
> specific API for a specific functionality, please read the source code.
> [...]
No, I don't assume that. I assume that all functions are strictly
> available in a Wide character version, and have verified that they are.

The wide APIs use UTF-16.  UTF-16 suffers from the same problem as UTF-8:
not all sequences of words are valid UTF-16 sequences.  In particular,
sequences containing isolated surrogate pairs are not well-formed according
to the Unicode standard.  Therefore, the existence of a wide character API
function does not guarantee that the wide character strings it returns can
be converted into valid unicode strings.  And, in fact, Windows Vista
happily creates files with malformed UTF-16 encodings, and os.listdir()
happily returns them.

> If you can crash Python that way,
> nothing gets worse by this PEP - you can then *already* crash Python
> in that way.

Yes, but AFAIK, Python does not currently have functions that, as part of
correct usage and normal operation, are intended to generate malformed
unicode strings.

Under your proposal, passing the output from a correctly implemented file
system or other OS function to a correctly written library using unicode
strings may crash Python.  In order to avoid that, every library that's
built into Python would have to be checked and updated to deal with both the
Unicode standard and your extension to it.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090429/726af08b/attachment.htm>

More information about the Python-Dev mailing list