[Python-Dev] Unicode Imports

Sat Sep 9 20:52:48 CEST 2006

Nick Coghlan wrote:
> David Hopwood wrote:
>> Martin v. Löwis wrote:
>>> Nick Coghlan schrieb:
>>>
>>>> So this is taking something that *already works properly on POSIX
>>>> systems* and making it work on Windows as well.
>>>
>>> I doubt it does without side effects. For example, an application that
>>> would go through sys.path, and encode everything with
>>> sys.getfilesystemencoding() currently works, but will break if the patch
>>> is applied and non-mbcs strings are put on sys.path.
>>
>> Huh? It won't break on any path for which it is not already broken.
>>
>> You seem to be saying "Paths with non-mbcs strings shouldn't work on
>> Windows, because they haven't worked in the past."
> 
> I think MvL is looking at it from the point of view of consumers of the
> list of strings in sys.path, such as PEP 302 importer and loader
> objects, and tools like module_finder. Currently, the list of values in
> sys.path is limited to:
> 
> 1. 8-bit strings
> 2. Unicode strings containing only characters which can be encoded using
> the default file system encoding

On Windows, file system pathnames can contain arbitrary Unicode characters
(well, almost). Despite the existence of "ANSI" filesystem APIs, and
regardless of what 'sys.getfilesystemencoding()' returns, the underlying
file system encoding for NTFS and FAT filesystems is UTF-16LE.

Thus, either:
 - the fact that sys.getfilesystemencoding() returns a non-Unicode encoding
   on Windows is a bug, or
 - any program that relies on sys.getfilesystemencoding() being able to
   encode arbitrary Windows pathnames has a bug.

We need to decide which of these is the case.

-- 
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>