[Python-Dev] unicode imports

Nick Coghlan ncoghlan at gmail.com
Mon Jun 19 15:46:13 CEST 2006


Kristján V. Jónsson wrote:
> Funny that no other platforms could benefit from a unicode import path.
> Does that mean that windows will reign supreme?  Please explain.

As near as I can tell, other platforms use encoded strings with the normal 
(byte-based) posix file API, so the Python interpreter and the file system 
simply need to agree on the encoding (typically utf-8) in order for both 
filesystem access and importing from non-ASCII paths to work.

On Windows, though, most of the file system interaction code has had to be 
updated to use the wide-character API where possible. import.c is one of the 
few holdouts that relies entirely on the byte-based posix API.

If I had to put money on what's currently happening on your test machine, it's 
that import.c is trying to do u'c:/tmp/\u814c'.encode('mbcs'), getting 
'c:/tmp/?' and proceeding to do nothing useful with that path entry. Checking 
the result of sys.getfilesystemencoding() should be able to confirm that.

So it looks like it ain't really gonna work properly on Windows unless 
import.c is rewritten to use the Unicode-aware platform independent IO 
implementation in posixmodule.c.

Until that happens (hopefully by Python 2.6), I like MvL's suggestion - look 
at the 8.3 DOS name on the command prompt and put that into sys.path. ctypes 
and/or pywin32 should let you get at that information programmatically.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-Dev mailing list