[Python-Dev] Unicode Imports

Nick Coghlan ncoghlan at gmail.com
Sat Sep 9 07:55:56 CEST 2006

Martin v. Löwis wrote:
> Steve Holden schrieb:
>> Or simply that this inability isn't currently 
>> described in a bug report on Sourceforge?
> No: sys.path is specified (originally) as containing a list of byte
> strings; it was extended to also support path importers (or whatever
> that PEP calls them). It was never extended to support Unicode strings.
> That other PEP e

That other PEP being PEP 302. That said, Unicode strings *are* permitted on 
sys.path - the import system will automatically encode them to an 8-bit string 
using the default filesystem encoding as part of the import process.

This works fine on Unix systems that use UTF-8 encoded strings to handle 
Unicode paths at the C API level, but is screwed on Windows because the 
default mbcs filesystem encoding can't handle the full range of possible 
Unicode path names (such as the Chinese directories that originally gave 
Kristján grief).

To get Unicode path names to work on Windows, you have to use the 
Windows-specific wide character API instead of the normal C API, and the 
import machinery doesn't do that.

So this is taking something that *already works properly on POSIX systems* and 
making it work on Windows as well.

>> I agree it's a relatively large patch for a release candidate but if 
>> prudence suggests deferring it, it should be a *definite* for 2.5.1 and 
>> subsequent releases.
> I'm not so sure it should. It *is* a new feature: it makes applications
> possible which aren't possible today, and the documentation does not
> ever suggest that these applications should have been possible. In fact,
> it is common knowledge that this currently isn't supported.

It should already work fine on POSIX filesystems that use the default 
filesystem encoding for path names. As far as I am aware, it is only Windows 
where it doesn't work.


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-Dev mailing list