[Python-Dev] My work on Python3 and non-ascii paths is done

Victor Stinner victor.stinner at haypocalc.com
Fri Oct 22 14:01:44 CEST 2010


Le jeudi 21 octobre 2010 21:14:55, Toshio Kuratomi a écrit :
> > That's exactly what I was looking for!  Thanks.  I think you've learned a
> > huge amount of good information that's difficult to find, so writing it
> > up in a more permanent and easy to find location will really help future
> > Python developers!
> 
> One further thing I'd be interested in is if you could document any best
> practices from this experience.  Things like, "surrogateescape is a
> good/bad default in these cases",

I advice to use the PEP 383 (surrogateescape) when the *native* data type is 
bytes. Some examples:
 - filenames on UNIX/BSD
 - environment variables on UNIX/BSD
 - well, most data send/received from the system on UNIX/BSD :-)

For network protocols, I don't know. It looks like the new email modules will 
offer two API levels: low level (native type) using bytes, high level using 
str (unicode). I don't know if the high level API uses the PEP 383 or not.

PEP 383 can be used to avoid UnicodeDecodeError. But sometimes it's better to 
raise an error to warn the user that the encoding is incorrect or the input 
data is invalid (well, at least not correctly according to the encoding).

I don't use strict rules. Each problem is different. Eg. it looks like not 
everybody agrees to use the PEP 383 for the host/domain name (issue #9377, I 
didn't read the whole issue, just few lines).

> When is parallel functions for bytes and str better than a single
> polymorphic function?

If you cannot decide the output type depending on the inputs, it's better to 
have two functions.

Examples:
 - 2 functions; os.getcwd() / os.getcwdb().
 - polymorphic: os.path.*()

But you should never accept mixed types, eg. os.path.join(b'bytes', 'unicode) 
have to raise a TypeError.

-- 
Victor Stinner
http://www.haypocalc.com/


More information about the Python-Dev mailing list