adding Py{String|Unicode}_{Lower|Upper} and fixing CreateProcess in _subprocess.pyd and PyWin32

There is a subtlety in CreateProcess in the Win32 API in that if one specifies an environment (via the lpEnvironment argument), the environment strings (A) must be sorted alphabetically and (B) that sort must be case-insensitive. See the Remarks section on: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/bas... If this is not done, then surprises can happen with the use of {Get|Set}EnvironmentVariable in the created process: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/bas... Neither _subprocess.pyd (supporting the new subprocess.py module on Windows) nor PyWin32's CreateProcess binding do this. I haven't done so yet, but I should be able to put together a test case for subprocess.py for this. We came across such a surprise when using my process.py module that uses this PyWin32 code (which it looks like _subprocess.c borrowed). Fixing (A) is easy with a "PyList_Sort(keys)" and some other minor changes to _subprocess.c::getenvironment() -- and to win32process.i::CreateEnvironmentString() in PyWin32. However, I'd like some guidance on the best way to case-insensitively sort a Python list in C code to fix (B). The best thing I see would be to expose PyString_Lower/PyUnicode_Lower and/or PyString_Upper/PyUnicode_Upper so they can be used to .lower()/.upper() the given environment mapping keys for sorting. Does that sound reasonable? Is there some problem to this approach that anyone can see? Trent -- Trent Mick trentm@activestate.com

Trent Mick wrote:
However, I'd like some guidance on the best way to case-insensitively sort a Python list in C code to fix (B). The best thing I see would be to expose PyString_Lower/PyUnicode_Lower and/or PyString_Upper/PyUnicode_Upper so they can be used to .lower()/.upper() the given environment mapping keys for sorting.
Does that sound reasonable? Is there some problem to this approach that anyone can see?
Here we go. Exposing PyString_Lower would be a new feature, we are in beta, so new features are not acceptable. That said, I personally could accept such a feature (exposing more C API) quite well between the beta and the final release. That said, I think exposing PyString_Lower would not be desirable (whether now or later). Instead, you should use PyObject_CallMethod to invoke .lower(). Regards, Martin

Trent Mick wrote:
There is a subtlety in CreateProcess in the Win32 API in that if one specifies an environment (via the lpEnvironment argument), the environment strings (A) must be sorted alphabetically and (B) that sort must be case-insensitive. See the Remarks section on: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/bas...
If this is not done, then surprises can happen with the use of {Get|Set}EnvironmentVariable in the created process: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/bas...
Neither _subprocess.pyd (supporting the new subprocess.py module on Windows) nor PyWin32's CreateProcess binding do this. I haven't done so yet, but I should be able to put together a test case for subprocess.py for this. We came across such a surprise when using my process.py module that uses this PyWin32 code (which it looks like _subprocess.c borrowed).
Fixing (A) is easy with a "PyList_Sort(keys)" and some other minor changes to _subprocess.c::getenvironment() -- and to win32process.i::CreateEnvironmentString() in PyWin32.
However, I'd like some guidance on the best way to case-insensitively sort a Python list in C code to fix (B). The best thing I see would be to expose PyString_Lower/PyUnicode_Lower and/or PyString_Upper/PyUnicode_Upper so they can be used to .lower()/.upper() the given environment mapping keys for sorting.
Does that sound reasonable? Is there some problem to this approach that anyone can see?
If you want to sort the list in C, it's better to provide a C sorting function. That function can then use Py_UNICODE_TOUPPER(ch) and toupper() for the comparison. Calling the .upper() method on the object would be much too expensive. Dito for creating new objects just for the purpose of comparing two objects. I think it would be worthwhile to consider replacing the string implementation's direct usage of toupper(), tolower() etc. with a Py_STRING_TOUPPER(ch) et al. approach much like the Unicode object does - at least for consistency reasons. In the future, I think it would be best to move away from the C lib implementation of toupper(), tolower() et al. because these are affected by the current locale settings which is not what you'd normally expect in Python. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 27 2004)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

However, I'd like some guidance on the best way to case-insensitively sort a Python list in C code to fix (B). The best thing I see would be to expose PyString_Lower/PyUnicode_Lower and/or PyString_Upper/PyUnicode_Upper so they can be used to .lower()/.upper() the given environment mapping keys for sorting.
Why not mimic the pure python approach? lowerfunc = PyObject_GetAttrString(&PyUnicode_Type, "lower"); PyObject_CallMethod(mylist, "sort", "OO", Py_None, lowerfunc); Raymond

Why not mimic the pure python approach?
lowerfunc = PyObject_GetAttrString(&PyUnicode_Type, "lower"); PyObject_CallMethod(mylist, "sort", "OO", Py_None, lowerfunc);
unicode.lower() doesn't like non-unicode objects, so this would only work if you know for sure there are no str objects in the list. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

[Trent Mick]
There is a subtlety in CreateProcess in the Win32 API in that if one specifies an environment (via the lpEnvironment argument), the environment strings (A) must be sorted alphabetically and (B) that sort must be case-insensitive.
Well, the docs are lying about Win95 (it doesn't sort envar blocks). On NT+, the sort must also be in "Unicode order, without regard to locale" -- WTF that means: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/bas... The best discussion I found: http://www.mail-archive.com/cygwin@cygwin.com/msg15239.html That claims NT+ uses RTLCompareUnicodeString to search for envars (and that function has a case-insensitive option).

Tim Peters wrote:
Well, the docs are lying about Win95 (it doesn't sort envar blocks). On NT+, the sort must also be in "Unicode order, without regard to locale" -- WTF that means:
It means that the sorting uses ordinal character numbers, not the order characters have in the German alphabet. The interesting question then is what case-insensitive sorting by ordinal number means: you convert either to lower case, or to upper case (again, independent of locale), and sort by ordinal number after that conversion. Then the question is: lower case or upper case? as the order depends on that choice. I believe Windows always uses upper case for this kind of thing. At least NTFS sorts by upper case, by means of the $UpCase file.
That claims NT+ uses RTLCompareUnicodeString to search for envars (and that function has a case-insensitive option).
That sounds reasonable, and it should be used, then, because it will be impossible to emulate that function with Python library functions (if for no other reason that RTLCompareUnicodeString probably assumes Unicode 2.2, whereas Python only has the Unicode 3.2 database). Regards, Martin
participants (6)
-
"Martin v. Löwis"
-
Guido van Rossum
-
M.-A. Lemburg
-
Raymond Hettinger
-
Tim Peters
-
Trent Mick