[Tutor] UTF-8 filenames encountered in os.walk

Tue Jul 3 23:56:49 CEST 2007

William O'Higgins Witteman wrote:
> I have several programs which traverse a Windows filesystem with French
> characters in the filenames.
> 
> I have having trouble dealing with these filenames when outputting these
> paths to an XML file - I get UnicodeDecodeError: 'ascii' codec can't
> decode byte 0xe9 ... etc.  That happens when I try to convert to UTF-8.
> 
> I know what os will give me UFT-8 if I give it UTF-8, and I am trying to
> do that, but somewhere down the line it seems like it reverts to ASCII,
> and then I get these errors.
> 
> Has anyone found a silver bullet for ensuring that all the filenames
> encountered by os.walk are treated as UTF-8?  Thanks.

Some code would help here, there are so many ways people get confused by 
UTF-8 and stumble over the subtleties of Python's use of Unicode. 
Particularly the code that gives you the error. The error you quote is a 
decode error, whereas converting to UTF-8 is encoding.

Also it would be helpful to figure out for sure what you are getting 
from os.walk() - is it UTF-8 or Unicode? The best way to find out is to
   print repr(filename)
and see what you get on output.

Kent