[Python-Dev] Python-3.0, unicode, and os.environ

Tres Seaver tseaver at palladion.com
Sat Dec 6 05:57:01 CET 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ulrich Eckhardt wrote:
> On Friday 05 December 2008, Guido van Rossum wrote:
>> At the risk of bringing up something that was already rejected, let me
>> propose something that follows the path taken in 3.0 for filenames,
>> rather than doubling back:
>>
>> For os.environ, os.getenv() and os.putenv(), I think a similar
>> approach as used for os.listdir() and os.getcwd() makes sense: let
>> os.environ skip variables whose name or value is undecodable, and have
>> a separate os.environb() which contains bytes; let os.getenv() and
>> os.putenv() do the right thing when the arguments passed in are bytes.
>>
>> For sys.argv, because it's positional, you can't skip undecodable
>> values, so I propose to use error=replace for the decoding; again, we
>> can add sys.argvb that contains the raw bytes values. The various
>> os.exec*() and os.spawn*() calls (as well as os.system(), os.popen()
>> and the subprocess module) should all accept bytes as well as strings.
>>
>> On Windows, the bytes APIs should probably not exist.
>>
>> I predict that most developers can get away with not using the bytes
>> APIs at all. The small minority that needs to be robust if not all
>> filenames use the system encoding can use the bytes APIs.
> 
> I know some of those developers, you can contact them via 
> python-dev at python.org. Seriously, what would you suggest to someone that 
> wants to handle paths in a portable way? Using the Unicode variants of 
> functions is fubar, because encoding/decoding is not universally possible. 
> Using the byte variant is equally fubar, because e.g. on MS Windows it is not 
> supported, except through a very lossy roundtrip through the locale's 
> codepage, limiting your functionality.
> 
> I actually think it is about time to give up on trying to think about a path 
> as a string. Dito for data received from os.environ or sys.argv. There are 
> only very few things that are universal to them and a reliable encoding is 
> none of them. Then, once you have let that idea go, meditate a bit over the 
> Zen.
> 
> What I propose is that paths must be treated as OS-specific, with the only 
> common reliable operations being joining them, concatenating them and 
> splitting them into segments divided by the (again, OS-specific) separator. 
> Other operations, like e.g. appending a string or converting it to a string 
> in order to display it can fail. And if they fail, they should fail noisily. 
> In 99% of all cases, using the default encoding will work and do what people 
> expect, which is why I would make this conversion automatic. In all other 
> cases, it will at least not fail silently (which would lead to garbage and 
> data loss) and allow more sophisticated applications to handle it.

Amen!  the idea that paths, environment varioables, and stuff pulled off
of sockets can be treated as text rather than strings is just wishful
thinking.


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJOgYd+gerLs4ltQ4RArQFAKDUZLXjwsIvNfNji4hbqM/aOZ0lMQCfRBq/
DHdYt2GGA1CrYA4a5pj+AZ4=
=4CcT
-----END PGP SIGNATURE-----


More information about the Python-Dev mailing list