[Tutor] UTF-8 filenames encountered in os.walk
Kent Johnson
kent37 at tds.net
Tue Jul 3 23:56:49 CEST 2007
William O'Higgins Witteman wrote:
> I have several programs which traverse a Windows filesystem with French
> characters in the filenames.
>
> I have having trouble dealing with these filenames when outputting these
> paths to an XML file - I get UnicodeDecodeError: 'ascii' codec can't
> decode byte 0xe9 ... etc. That happens when I try to convert to UTF-8.
>
> I know what os will give me UFT-8 if I give it UTF-8, and I am trying to
> do that, but somewhere down the line it seems like it reverts to ASCII,
> and then I get these errors.
>
> Has anyone found a silver bullet for ensuring that all the filenames
> encountered by os.walk are treated as UTF-8? Thanks.
Some code would help here, there are so many ways people get confused by
UTF-8 and stumble over the subtleties of Python's use of Unicode.
Particularly the code that gives you the error. The error you quote is a
decode error, whereas converting to UTF-8 is encoding.
Also it would be helpful to figure out for sure what you are getting
from os.walk() - is it UTF-8 or Unicode? The best way to find out is to
print repr(filename)
and see what you get on output.
Kent
More information about the Tutor
mailing list