Support byte string API of Windows in Python3?
Hi, I'm working on surrogates in filenames on Linux (more generally on BSD and UNIX OS) to support undecodable filenames, see PEP 383. Amaury told me that I only fixed the non-Windows versions (I fixed subprocess about the current directory and _ctypes.dlopen()), but it doesn't work on Windows. It's a choice, I didn't want to patch Windows because I know that Windows use unicode internally. I consider that developers using Python3 should use unicode on Windows, and byte or unicode+surrogates on other OS. I don't know well Windows API, and so I would like your opinion about that ;-) -- Victor Stinner http://www.haypocalc.com/
Le lundi 19 avril 2010 11:33:58, Victor Stinner a écrit :
I'm working on surrogates in filenames on Linux (...)
Related issues: #8391: os.execvpe() doesn't support surrogates in env #8393: subprocess: support undecodable current working directory on POSIX OS #8394: ctypes.dlopen() doesn't support surrogates #8412: os.system() doesn't support surrogates nor bytes I fixed the 3 last issues (#8393, #8394, #8412) for non-Windows OS. -- Victor Stinner http://www.haypocalc.com/
Victor Stinner <victor.stinner <at> haypocalc.com> writes:
It's a choice, I didn't want to patch Windows because I know that Windows use unicode internally. I consider that developers using Python3 should use unicode on Windows, and byte or unicode+surrogates on other OS.
I think both possibilities should be available on all OSes, so as to make it easier to write cross-platform code. Having to switch being bytes and unicode depending on the OS means developers will have to deal with encoding issues themselves, which is suboptimal from a language usability's point of view. Regards Antoine.
Antoine Pitrou wrote:
Victor Stinner <victor.stinner <at> haypocalc.com> writes:
It's a choice, I didn't want to patch Windows because I know that Windows use unicode internally. I consider that developers using Python3 should use unicode on Windows, and byte or unicode+surrogates on other OS.
I think both possibilities should be available on all OSes, so as to make it easier to write cross-platform code. Having to switch being bytes and unicode depending on the OS means developers will have to deal with encoding issues themselves, which is suboptimal from a language usability's point of view.
Indeed, you shouldn't be switching. Instead, you should be using Unicode strings all the time. Regards, Martin
I'm working on surrogates in filenames on Linux (more generally on BSD and UNIX OS) to support undecodable filenames, see PEP 383. Amaury told me that I only fixed the non-Windows versions (I fixed subprocess about the current directory and _ctypes.dlopen()), but it doesn't work on Windows.
It's a choice, I didn't want to patch Windows because I know that Windows use unicode internally. I consider that developers using Python3 should use unicode on Windows, and byte or unicode+surrogates on other OS.
I don't know well Windows API, and so I would like your opinion about that ;-)
Can you please elaborate what the specific issue is? I completely fail to see what byte strings have to do with surrogate codes. AFAICT, on Windows, you can just use surrogate codes at the APIs, and be done. Regards, Martin
Le lundi 19 avril 2010 22:55:39, vous avez écrit :
Can you please elaborate what the specific issue is?
Amaury reopened my issue #8393 "subprocess: support undecodable current working directory on POSIX OS" because "It does not work on Windows" (bytes are rejected).
I completely fail to see what byte strings have to do with surrogate codes. AFAICT, on Windows, you can just use surrogate codes at the APIs, and be done.
Before my patch, subprocess used PyArg_ParseTuple(args, "...z...", ...) to parse the current working directory: surrogates were rejected. But I specified in my issue title that the issue is specific to "POSIX OS". I should replace it by "non-Windows". -- Amaury also reopened #8394 "ctypes.dlopen() doesn't support surrogates", because ctypes.CDLL() rejects byte string. On Windows, Python3 uses LoadLibraryW() to load a library, and the Python API rejects byte string. -- The question was: should we change python3 to accept byte strings on Windows? I think that I can re-close these two issues because it's a good thing to avoid the evil, locale dependent, mbcs encoding ;-) Unicode is a superset of mbcs. -- Victor Stinner http://www.haypocalc.com/
Amaury reopened my issue #8393 "subprocess: support undecodable current working directory on POSIX OS" because "It does not work on Windows" (bytes are rejected).
I see. I'd like to know whether that's an incompatible change. We shouldn't make incompatible changes in that matter. However, if you were not able to pass a bytes cwd on Windows before, it's fine that you still are not able to. I'm puzzled how your patch could have possibly affected Windows, since it was only change _posixsubprocess.c. So I'm closing the report again.
Amaury also reopened #8394 "ctypes.dlopen() doesn't support surrogates", because ctypes.CDLL() rejects byte string.
Ok, I'll close this as well. Regards, Martin
Victor Stinner:
It's a choice, I didn't want to patch Windows because I know that Windows use unicode internally. I consider that developers using Python3 should use unicode on Windows, and byte or unicode+surrogates on other OS.
The Win32 byte string APIs convert their inputs to Unicode and then run Unicode code. You don't get additional capabilities by calling the byte string APIs and should avoid them completely. Including an easy way to invoke them on Windows will just lead to failures. People may think that Unix code that uses the byte string APIs for better platform fidelity can just run this code on Windows and get equivalent benefits. They won't and instead will see an inverted form of the problems they are trying to avoid on Unix. If there is ever a reason to use a byte string API on Windows (and I can't think of any) then ctypes can be used to explicitly call the API desired. Neil
participants (4)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Neil Hodgson
-
Victor Stinner