[Python-Dev] Unicode Imports

Nick Coghlan ncoghlan at gmail.com
Sat Sep 9 19:05:36 CEST 2006

David Hopwood wrote:
> Martin v. Löwis wrote:
>> Nick Coghlan schrieb:
>>> So this is taking something that *already works properly on POSIX
>>> systems* and making it work on Windows as well.
>> I doubt it does without side effects. For example, an application that
>> would go through sys.path, and encode everything with
>> sys.getfilesystemencoding() currently works, but will break if the patch
>> is applied and non-mbcs strings are put on sys.path.
> Huh? It won't break on any path for which it is not already broken.
> You seem to be saying "Paths with non-mbcs strings shouldn't work on Windows,
> because they haven't worked in the past."

I think MvL is looking at it from the point of view of consumers of the list 
of strings in sys.path, such as PEP 302 importer and loader objects, and tools 
like module_finder. Currently, the list of values in sys.path is limited to:

1. 8-bit strings
2. Unicode strings containing only characters which can be encoded using the 
default file system encoding

For PEP 302 loaders, it is currently correct for them to take the 8-bit string 
they receive and do "path.decode(sys.getfilesystemencoding())"

Kristján's patch works nicely for his application because he doesn't have to 
worry about compatibility with existing loaders and utilities. The core 
doesn't have that luxury.

We *might* be able to find a backwards compatible way to do it that could be 
put into 2.5.x, but that is effort that could more profitably be spent 
elsewhere, particularly since the state of the import system in Py3k will be 
for it to be based entirely on Unicode (as GvR pointed out last time this 
topic came up [1]).



Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-Dev mailing list