[Python-Dev] Python-3.0, unicode, and os.environ
steve at holdenweb.com
Thu Dec 11 18:46:57 CET 2008
Ulrich Eckhardt wrote:
> On Thursday 11 December 2008, Steve Holden wrote:
>> Ulrich Eckhardt wrote:
>>> What I'd just like some feedback on is the approach to return a distinct
>>> type (neither a byte string nor a Unicode string) from readdir(). In
>>> order to use this, a programmer will have to convert it explicitly,
>>> otherwise e.g. printing it will just produce <env_string at 0x01234567>.
>>> This will immediately bump each programmer with their heads on the issue
>>> of unknown encodings and they will have to make the application-specific
>>> choice whether an approximation of the filename, an exception or ignoring
>>> the file is the right choice. Also, it presents the options for doing
>>> this conversion in a single class, which I personally find much better
>>> than providing overloads for hundreds of functions.
>> Seems to me this just threatens to add to the confusion.
>> If you know what your filesystem produces, you can take the appropriate
>> action to convert it into a type that makes sense to the user. If you
>> don't, then at least if you have the string in its bytes form you can
> There are operating systems that don't use bytes to represent a file path,
> namely all the MS Windows variants. Even worse, when you use a byte string
> there, it typically means that you want to use the obsolete encoding that is
> based on codepages.
> Why can we not preserve the representation of a path as it is? Why do we
> _have_ to convert it to anything at all, without even knowing if this
> conversion is needed? I just want to do something to a file's content, why
> does its path have to be converted to something and then be converted back in
> order for the system to digest it?
You don't: that was my point. You only need to perform any kind of
conversion when the filename has to be presented to something other than
the file system.
>> re-present it to the filesystem to manipulate the file. What are we
>> supposed to do with the "special type"?
> You receive from readdir() and pass it to stat(), simple as that. No
> conversions from the native representation needed. If you need a textual
> representation, then you have to convert it and you have to do so explicitly
> according to whatever logic your application requires.
> If readdir() returned Unicode text, people would start taking that for
> granted. If it returned bytes, just the same. Returning a completely
> unrelated type will give them enough hint that for this thing they have to
> rethink their assumptions. This runs along the lines of "In the face of
> ambiguity, refuse the temptation to guess.", as it makes guessing rather
So you are suggesting this "special object" be used only to represent
files to users? Now I understand.
> I just don't see a case where using a separate path class would break things.
> Further, the special handling that is required would be made even clearer by
> using such a class.
But it does have to be implemented ...
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
More information about the Python-Dev