[Tutor] UTF-8 filenames encountered in os.walk

William O'Higgins Witteman hmm at woolgathering.cx
Wed Jul 4 16:51:11 CEST 2007


On Tue, Jul 03, 2007 at 06:04:16PM -0700, Terry Carroll wrote:
>
>> Has anyone found a silver bullet for ensuring that all the filenames
>> encountered by os.walk are treated as UTF-8?  Thanks.
>
>What happens if you specify the starting directory as a Unicode string, 
>rather than an ascii string, e.g., if you're walking the current 
>directory:
> 
> for thing in os.walk(u'.'):
>
>instead of:
>
> for thing in os.walk('.'): 

This is a good thought, and the crux of the problem.  I pull the
starting directories from an XML file which is UTF-8, but by the time it
hits my program, because there are no extended characters in the
starting path, os.walk assumes ascii.  So, I recast the string as UTF-8,
and I get UTF-8 output.  The problem happens further down the line.

I get a list of paths from the results of os.walk, all in UTF-8, but not
identified as such.  If I just pass my list to other parts of the
program it seems to assume either ascii or UTF-8, based on the
individual list elements.  If I try to cast the whole list as UTF-8, I
get an exception because it is assuming ascii and receiving UTF-8 for
some list elements.

I suspect that my program will have to make sure to recast all
equivalent-to-ascii strings as UTF-8 while leaving the ones that are
already extended alone.
-- 

yours,

William


More information about the Tutor mailing list