os.lisdir, gets unicode, returns unicode... USUALLY?!?!?

gabor gabor at nekomancer.net
Mon Nov 20 00:23:37 CET 2006


Martin v. Löwis wrote:
> gabor schrieb:
>> 1. simply fix the documentation, and state that if the file-name cannot
>> be decoded into unicode, then it's returned as byte-string. 
> 
> For 2.5, this should be done. Contributions are welcome.
> 
> [...then]
>> [os.path.join(path,n) for n in os.listdir(path)]
>>
>> will not work.
>>
>> 2. add support for some unicode-decoding flags, like i wrote before
> 
> I may have missed something, but did you present a solution that would
> make the case above work?

if we use the same decoding flags as binary-string.decode(),
then we could do:

[os.path.join(path,n) for n in os.listdir(path,'ignore')]

or

[os.path.join(path,n) for n in os.listdir(path,'replace')]

it's not an elegant solution, but it would solve i think most of the 
problems.


> 
>> 3. some solution.
> 
> One approach I had been considering is to always make the decoding
> succeed, by using the private-use-area of Unicode to represent bytes
> that don't decode correctly.
> 

hmm..an interesting idea..

and what happens with such texts, when they are encoded into let's say 
utf-8? are the in-private-use-area characters ignored?

gabor



More information about the Python-list mailing list