[Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

M.-A. Lemburg mal at egenix.com
Tue Sep 30 17:20:42 CEST 2008


On 2008-09-30 16:05, Guido van Rossum wrote:
> On Tue, Sep 30, 2008 at 3:31 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 2008-09-30 08:00, Martin v. Löwis wrote:
>>>> Change the default file system encoding to store bytes in Unicode is like
>>>> introducing a new Python type: <fake Unicode for filename hacks>.
>>> Exactly. Seems like the best solution to me, despite your polemics.
>> Not a bad idea... have os.listdir() return Unicode subclasses that work
>> like file handles, ie. they have an extra buffer that holds the original
>> bytes value received from the underlying C API.
>>
>> Passing these handles to open() would then do the right thing by using
>> whatever os.listdir() got back from the file system to open the file,
>> while still providing a sane way to display the filename, e.g. using
>> question marks for the invalid characters.
>>
>> The only problem with this approach is concatenation of such handles
>> to form pathnames, but then perhaps those concatenations could just
>> work on the bytes value as well (I don't know of any OS that uses non-
>> ASCII path separators).
> 
> While this seems to work superficially I expect an infinite number of
> problems caused by code that doesn't understand this subclass. You are
> hinting at this in your last paragraph.

Well, to some extent Unicode objects themselves already implement
such a strategy: the default encoded bytes object basically provides
the low-level interfacing value.

But I agree, the approach is not foolproof.

In the end, I think it's better not to be clever and just return
the filenames that cannot be decoded as bytes objects in os.listdir().

Passing those to open() will then open the files as expected, in most
other cases the application will have to provide explicit conversions
in whatever way best fits the application.

Also note that os.listdir() isn't the only source of filesnames. You
often read them from a file, a database, some socket, etc, so letting
the application decide what to do is not asking too much, IMHO.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 30 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611


More information about the Python-Dev mailing list