os.lisdir, gets unicode, returns unicode... USUALLY?!?!?

gabor gabor at nekomancer.net
Mon Nov 20 00:59:41 CET 2006

Martin v. Löwis wrote:
> gabor schrieb:
>>> I may have missed something, but did you present a solution that would
>>> make the case above work?
>> if we use the same decoding flags as binary-string.decode(),
>> then we could do:
>> [os.path.join(path,n) for n in os.listdir(path,'ignore')]
> That wouldn't work. The characters in the file name that didn't
> decode would be dropped, so the resulting file names would be
> invalid. Trying to do os.stat() on such a file name would raise
> an exception that the file doesn't exist.
>> [os.path.join(path,n) for n in os.listdir(path,'replace')]
> Likewise. The characters would get replaced with REPLACEMENT
> CHARACTER; passing that to os.stat would give an encoding
> error.
>> it's not an elegant solution, but it would solve i think most of the
>> problems.
> No, it wouldn't. This idea is as bad or worse than just dropping
> these file names from the directory listing.

i think that depends on the point of view.
if you need to do something later with the content of files, then you're 

but if all you need is to display them for example...

>>> One approach I had been considering is to always make the decoding
>>> succeed, by using the private-use-area of Unicode to represent bytes
>>> that don't decode correctly.
>> hmm..an interesting idea..
>> and what happens with such texts, when they are encoded into let's say
>> utf-8? are the in-private-use-area characters ignored?
> UTF-8 supports encoding of all Unicode characters, including the PUA
> blocks.
> py> u"\ue020".encode("utf-8")
> '\xee\x80\xa0'

so basically you'd like to be able to "round-trip"?

so that:

listdir returns an array of filenames, the un-representable bytes will 
be represented in the PUA.

all the other file-handling functions (stat, open, etc..) recognize such 
strings, and handle them correctly.



More information about the Python-list mailing list