The "program files" and "user" directory should still have names
"should" or "will"?
representable in the normal locale used by the user so they are able to access them by using their standard encoding in a Python narrow character string to the open function.
I dont understand what "their standard encoding" is here. My understanding is that "their standard encoding" is whatever WideCharToMultiByte() returns, and this is what mbcs is.
My understanding is that their "default encoding" will bear no relationship to encoding names as known by Python. ie, given a user's locale, there is no reasonable way to determine which of the Python encoding names will always correctly work on these strings.
The way I see it, to fix this we have 2 basic choices when a Unicode
is passed as a filename:
- we call the Unicode versions of the CRTL.
This is by far the better approach IMO as it is more general and will work for people who switch locales or who want to access files created by others using other locales. Although you can always use the horrid mangled "*~1" names.
- we auto-encode using the "mbcs" encoding, and still call the
versions of the CRTL.
This will improve things but to a lesser extent than the above. May be the best possible on 95.
I understand the above, but want to resist having different NT and 9x versions of Python for obvious reasons. I also wanted to avoid determining at runtime if the platform has Unicode support and magically switching to them.
I concur on the "may be the best possible on 95" and see no real downsides on NT, other than the freak possibility of the default encoding being change _between_ us encoding a string and the OS decoding it.
Recall that my change is only to convert from Unicode to a string so the file system can convert back to Unicode. There is no real opportunity for the current locale to change on this thread during this process.
I guess I see 3 options:
1) Do nothing, thereby forcing the user to manually encode the Unicode object. Only by encoding the string can they access these filenames, which means the exact same issues apply.
2) Move to Unicode APIs where available, which will be a much deeper patch and much harder to get right on non-Unicode Windows platforms.
3) Like 1, but simply automate the encoding task.
My proposal was to do (3). It is not clear from your mail what you propose. Like me, you seem to agree (2) would be perfect in an ideal world, but you also agree we don't live in one.
What is your recommendation?