[Python-Dev] Full unicode support for the import machinery

Victor Stinner victor.stinner at haypocalc.com
Fri Jul 9 02:11:35 CEST 2010


I'm trying to fix Python to support undecodable bytes in the Python path since 
some months ago. My first try was really huge and sometimes ugly. When it was 
possible, I extracted some short and simple patches and applied them to py3k 
(sometimes with an issue, sometimes directly in the svn).

When it was no more possible to split the big patch, I restarted the work from 
scratch. The main change from my previous try is that I changed import.c to 
use unicode strings instead of byte strings. With the surrogate hack (PEP 
383), unicode is a superset of bytes and so it is "forward compatible".

I just created a branch called "import_unicode" (based on py3k) including all 
my patches. It's still a work in progress. It is possible to start Python 
installed in an undecodable path (eg. directory with an non-ASCII character 
with C locale for Linux), which is an huge progress, but some tests are still 

The last biggest problem is that code object filenames are not reencoded after 
that the file system encoding is changed (but sys.path and sys.modules 
filenames are reencoded). I think that I will register all code objects into a 
list to be able to reencode their filename attribute (and then drop the list). 

I created an svn branch because I think that it's easier to review short 
commits than one unique huge patch. The branch also helps me to share the 
branch between different computers, and allow other people to review the 
commits (and/or contribute!).

Some people will maybe understand better my work with the "whole picture" :-)


There are at least 4 issues related to this work:

 #3080: Full unicode import system
 #4352: imp.find_module() fails with a UnicodeDecodeError 
        when called with non-ASCII search paths
 #8611: Python3 doesn't support locale different than utf8 
        and an non-ASCII path (POSIX)
 #8988: import + coding = failure (3.1.2/win32)


Some examples of previous issues related to my secret goal (patch import 

 #8391: os.execvpe() doesn't support surrogates in env
 #8393: subprocess: support undecodable current working directory on POSIX OS
 #8412: os.system() doesn't support surrogates nor bytes
 #8485: Don't accept bytearray as filenames, or simplify the API
 # 8514: Add fsencode() functions to os module
 #8610: Python3/POSIX: errors if file system encoding is None 
 (-> create initfsencoding() in pythonrun.c)
 #8715: Create PyUnicode_EncodeFSDefault() function

Victor Stinner

More information about the Python-Dev mailing list