On Tue, Sep 30, 2008 at 3:31 AM, M.-A. Lemburg <mal@egenix.com> wrote:
On 2008-09-30 08:00, Martin v. Löwis wrote:
Change the default file system encoding to store bytes in Unicode is like introducing a new Python type: <fake Unicode for filename hacks>.
Exactly. Seems like the best solution to me, despite your polemics.
Not a bad idea... have os.listdir() return Unicode subclasses that work like file handles, ie. they have an extra buffer that holds the original bytes value received from the underlying C API.
Passing these handles to open() would then do the right thing by using whatever os.listdir() got back from the file system to open the file, while still providing a sane way to display the filename, e.g. using question marks for the invalid characters.
The only problem with this approach is concatenation of such handles to form pathnames, but then perhaps those concatenations could just work on the bytes value as well (I don't know of any OS that uses non- ASCII path separators).
While this seems to work superficially I expect an infinite number of problems caused by code that doesn't understand this subclass. You are hinting at this in your last paragraph. -- --Guido van Rossum (home page: http://www.python.org/~guido/)