os.lisdir, gets unicode, returns unicode... USUALLY?!?!?
gabor at nekomancer.net
Fri Nov 17 00:09:56 CET 2006
Martin v. Löwis wrote:
> gabor schrieb:
>> or am i using os.listdir the "wrong way"? how do other people deal with
> You didn't say why the behavior causes a problem for you - you only
> explained what the behavior is.
> Most people use os.listdir in a way like this:
> for name in os.listdir(path):
> full = os.path.join(path, name)
> attrib = os.stat(full)
> if some-condition:
> f = open(full)
> All this code will typically work just fine with the current behavior,
> so people typically don't see any problem.
i am sorry, but it will not work. actually this is exactly what i did,
and it did not work. it dies in the os.path.join call, where file_name
is converted into unicode. and python uses 'ascii' as the charset in
such cases. but, because listdir already failed to decode the file_name
with the filesystem-encoding, it usually also fails when tried with 'ascii'.
>>> dir_name = u'something'
>>> unicode_file_name = u'\u732b.txt' # the japanese cat-symbol
>>> bytestring_file_name = unicode_file_name.encode('utf-8')
>>> import os.path
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.4/posixpath.py", line 65, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 1:
ordinal not in range(128)
More information about the Python-list