[Python-Dev] Python-3.0, unicode, and os.environ

Ulrich Eckhardt eckhardt at satorlaser.com
Fri Dec 5 11:27:35 CET 2008


On Friday 05 December 2008, Guido van Rossum wrote:
> At the risk of bringing up something that was already rejected, let me
> propose something that follows the path taken in 3.0 for filenames,
> rather than doubling back:
>
> For os.environ, os.getenv() and os.putenv(), I think a similar
> approach as used for os.listdir() and os.getcwd() makes sense: let
> os.environ skip variables whose name or value is undecodable, and have
> a separate os.environb() which contains bytes; let os.getenv() and
> os.putenv() do the right thing when the arguments passed in are bytes.
>
> For sys.argv, because it's positional, you can't skip undecodable
> values, so I propose to use error=replace for the decoding; again, we
> can add sys.argvb that contains the raw bytes values. The various
> os.exec*() and os.spawn*() calls (as well as os.system(), os.popen()
> and the subprocess module) should all accept bytes as well as strings.
>
> On Windows, the bytes APIs should probably not exist.
>
> I predict that most developers can get away with not using the bytes
> APIs at all. The small minority that needs to be robust if not all
> filenames use the system encoding can use the bytes APIs.

I know some of those developers, you can contact them via 
python-dev at python.org. Seriously, what would you suggest to someone that 
wants to handle paths in a portable way? Using the Unicode variants of 
functions is fubar, because encoding/decoding is not universally possible. 
Using the byte variant is equally fubar, because e.g. on MS Windows it is not 
supported, except through a very lossy roundtrip through the locale's 
codepage, limiting your functionality.

I actually think it is about time to give up on trying to think about a path 
as a string. Dito for data received from os.environ or sys.argv. There are 
only very few things that are universal to them and a reliable encoding is 
none of them. Then, once you have let that idea go, meditate a bit over the 
Zen.

What I propose is that paths must be treated as OS-specific, with the only 
common reliable operations being joining them, concatenating them and 
splitting them into segments divided by the (again, OS-specific) separator. 
Other operations, like e.g. appending a string or converting it to a string 
in order to display it can fail. And if they fail, they should fail noisily. 
In 99% of all cases, using the default encoding will work and do what people 
expect, which is why I would make this conversion automatic. In all other 
cases, it will at least not fail silently (which would lead to garbage and 
data loss) and allow more sophisticated applications to handle it.

Uli

-- 
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht verantwortlich.

**************************************************************************************



More information about the Python-Dev mailing list