[Python-Dev] Adding the 'path' module (was Re: Some RFE for review)

Guido van Rossum gvanrossum at gmail.com
Sat Jul 9 20:21:04 CEST 2005


On 7/9/05, Neil Hodgson <nyamatongwe at gmail.com> wrote:
> M.-A. Lemburg:
> 
> > I don't really buy this "trick": what if you happen to have
> > a home directory with Unicode characters in it ?
> 
>    Most people choose account names and thus home directory names that
> are compatible with their preferred locale settings: German users are
> unlikely to choose an account name that uses Japanese characters.
> Unicode is only necessary for file names that are outside your default
> locale. An administration utility may need to visit multiple user's
> home directories and so is more likely to encounter files with names
> that can not be represented in its default locale.
> 
>    I think it would be better if sys.path could include unicode
> entries but expect the code will rarely be exercised.

Another problem is that if you can return 8-bit strings encoded in the
local code page, and also Unicode, combining the two using string
operations (e.g. a directory using the local code page containing a
file using Unicode, and then combining the two using os.path.join())
will fail unless the local code page is also Python's global default
encoding (which it usually isn't -- we really try hard to keep the
default encoding 'ascii' at all times).

In some sense the safest approach from this POV would be to return
Unicode as soon as it can't be encoded using the global default
encoding. IOW normally this would return Unicode for all names
containing non-ASCII characters.

The problem is of course that while the I/O functions will handle this
fine, *printing* Unicode still doesn't work by default. :-( I can't
wait until we switch everything to Unicode and have encoding on all
streams...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list