[Python-Dev] Python-3.0, unicode, and os.environ

Steve Holden steve at holdenweb.com
Thu Dec 11 18:46:57 CET 2008

Ulrich Eckhardt wrote:
> On Thursday 11 December 2008, Steve Holden wrote:
>> Ulrich Eckhardt wrote:
>>> What I'd just like some feedback on is the approach to return a distinct
>>> type (neither a byte string nor a Unicode string) from readdir(). In
>>> order to use this, a programmer will have to convert it explicitly,
>>> otherwise e.g. printing it will just produce <env_string at 0x01234567>.
>>> This will immediately bump each programmer with their heads on the issue
>>> of unknown encodings and they will have to make the application-specific
>>> choice whether an approximation of the filename, an exception or ignoring
>>> the file is the right choice. Also, it presents the options for doing
>>> this conversion in a single class, which I personally find much better
>>> than providing overloads for hundreds of functions.
> [...]
>> Seems to me this just threatens to add to the confusion.
>> If you know what your filesystem produces, you can take the appropriate
>> action to convert it into a type that makes sense to the user. If you
>> don't, then at least if you have the string in its bytes form you can
>                                        ^^^^^^^^^^^^^^^^^^^
> There are operating systems that don't use bytes to represent a file path, 
> namely all the MS Windows variants. Even worse, when you use a byte string 
> there, it typically means that you want to use the obsolete encoding that is 
> based on codepages.
> Why can we not preserve the representation of a path as it is? Why do we 
> _have_ to convert it to anything at all, without even knowing if this 
> conversion is needed? I just want to do something to a file's content, why 
> does its path have to be converted to something and then be converted back in 
> order for the system to digest it?
You don't: that was my point. You only need to perform any kind of
conversion when the filename has to be presented to something other than
the file system.

>> re-present it to the filesystem to manipulate the file. What are we
>> supposed to do with the "special type"?
> You receive from readdir() and pass it to stat(), simple as that. No 
> conversions from the native representation needed. If you need a textual 
> representation, then you have to convert it and you have to do so explicitly 
> according to whatever logic your application requires.

> If readdir() returned Unicode text, people would start taking that for 
> granted. If it returned bytes, just the same. Returning a completely 
> unrelated type will give them enough hint that for this thing they have to 
> rethink their assumptions. This runs along the lines of "In the face of 
> ambiguity, refuse the temptation to guess.", as it makes guessing rather 
> impossible.
So you are suggesting this "special object" be used only to represent
files to users? Now I understand.

> I just don't see a case where using a separate path class would break things. 
> Further, the special handling that is required would be made even clearer by 
> using such a class.
But it does have to be implemented ...

Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/

More information about the Python-Dev mailing list