[Python-Dev] PEP 383 (again)
"Martin v. Löwis"
martin at v.loewis.de
Tue Apr 28 22:04:12 CEST 2009
> Your proposal says that utf-8b would be used for file systems, but then
> you also say that it might be used for command line arguments and
> environment variables. So, which specific APIs will it be used with on
> Windows and on POSIX systems?
On Windows, the Wide APIs are already used throughout the code base,
e.g. SetEnvironmentVariableW/_wenviron. If you need to find out the
specific API for a specific functionality, please read the source code.
> Or will utf-8b simply not be available
> on Windows at all?
It will be available, but it won't be used automatically for
> What happens if I create a Python version of tar,
> utf-8b strings slip in there, and I try to use them on Windows?
No need to create it - the tarfile module is already there. By
"in there", do you mean on the file system, or in the tarfile?
> You also assume that all Windows file system functions strictly conform
> to UTF-16 in practice (not just on paper). Have you verified that?
No, I don't assume that. I assume that all functions are strictly
available in a Wide character version, and have verified that they are.
> What's the situation on Windows CE?
I can't see how this question is relevant to the PEP. The PEP says this:
# On Windows, Python uses the wide character APIs to access
# character-oriented APIs, allowing direct conversion of the
# environmental data to Python str objects.
This is what it already does, and this is what it will continue to do.
> Another question on Linux: what happens when I decode a file system path
> with utf-8b and then pass the resulting unicode string to Gnome? To
You probably get moji-bake, or an error, I didn't try.
> To windows.forms? To Java?
How do you do that, on Linux?
> To a unicode regular expression library?
You mean, SRE? SRE will match the code points as individual characters,
class Cs. You should have been able to find out that for yourself.
> To wprintf?
Depends on the wprintf implementation.
> AFAIK, the behavior of most libraries is
> undefined for the kinds of unicode strings you construct, and it may be
> undefined in a bad way (crash, buffer overflow, whatever).
Indeed so. This is intentional. If you can crash Python that way,
nothing gets worse by this PEP - you can then *already* crash Python
in that way.
More information about the Python-Dev