[Python-Dev] Python-3.0, unicode, and os.environ

Ulrich Eckhardt eckhardt at satorlaser.com
Thu Dec 11 14:41:46 CET 2008


On Thursday 11 December 2008, Steve Holden wrote:
> Ulrich Eckhardt wrote:
> > What I'd just like some feedback on is the approach to return a distinct
> > type (neither a byte string nor a Unicode string) from readdir(). In
> > order to use this, a programmer will have to convert it explicitly,
> > otherwise e.g. printing it will just produce <env_string at 0x01234567>.
> > This will immediately bump each programmer with their heads on the issue
> > of unknown encodings and they will have to make the application-specific
> > choice whether an approximation of the filename, an exception or ignoring
> > the file is the right choice. Also, it presents the options for doing
> > this conversion in a single class, which I personally find much better
> > than providing overloads for hundreds of functions.
[...]
>
> Seems to me this just threatens to add to the confusion.
>
> If you know what your filesystem produces, you can take the appropriate
> action to convert it into a type that makes sense to the user. If you
> don't, then at least if you have the string in its bytes form you can
                                       ^^^^^^^^^^^^^^^^^^^

There are operating systems that don't use bytes to represent a file path, 
namely all the MS Windows variants. Even worse, when you use a byte string 
there, it typically means that you want to use the obsolete encoding that is 
based on codepages.

Why can we not preserve the representation of a path as it is? Why do we 
_have_ to convert it to anything at all, without even knowing if this 
conversion is needed? I just want to do something to a file's content, why 
does its path have to be converted to something and then be converted back in 
order for the system to digest it?

> re-present it to the filesystem to manipulate the file. What are we
> supposed to do with the "special type"?

You receive from readdir() and pass it to stat(), simple as that. No 
conversions from the native representation needed. If you need a textual 
representation, then you have to convert it and you have to do so explicitly 
according to whatever logic your application requires.

If readdir() returned Unicode text, people would start taking that for 
granted. If it returned bytes, just the same. Returning a completely 
unrelated type will give them enough hint that for this thing they have to 
rethink their assumptions. This runs along the lines of "In the face of 
ambiguity, refuse the temptation to guess.", as it makes guessing rather 
impossible.

I just don't see a case where using a separate path class would break things. 
Further, the special handling that is required would be made even clearer by 
using such a class.

Uli

-- 
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht verantwortlich.

**************************************************************************************



More information about the Python-Dev mailing list