os.walk and os.listdir problems python 3.0+

Mark Tolonen metolone+gmane at gmail.com
Thu Jun 25 01:57:02 EDT 2009


"Amos Anderson" <amosanderson at gmail.com> wrote in message 
news:a073a9cf0906242007k5067314dn8e9d7b1c6da6286a at mail.gmail.com...
> I've run into a bit of an issue iterating through files in python 3.0 and
> 3.1rc2. When it comes to a files with '\u200b' in the file name it gives 
> the
> error...
>
> Traceback (most recent call last):
>  File "ListFiles.py", line 19, in <module>
>    f.write("file:{0}\n".format(i))
>  File "c:\Python31\lib\encodings\cp1252.py", line 19, in encode
>    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in
> position
> 30: character maps to <undefined>
>
> Code is as follows...
> import os
> f = open("dirlist.txt", 'w')
>
> for root, dirs, files in os.walk("C:\\Users\\Filter\\"):
>    f.write("root:{0}\n".format(root))
>    f.write("dirs:\n")
>    for i in dirs:
>        f.write("dir:{0}\n".format(i))
>    f.write("files:\n")
>    for i in files:
>        f.write("file:{0}\n".format(i))
> f.close()
> input("done")
>
> The file it's choking on happens to be a link that internet explorer
> created. There are two files that appear in explorer to have the same name
> but one actually has a zero width space ('\u200b') just before the .url
> extension. In playing around with this I've found several files with the
> same character throughout my file system. OS: Vista SP2, Language: US
> English.
>
> Am I doing something wrong or did I find a bug? It's worth noting that
> Python 2.6 just displays this character as a ? just as it appears if you
> type dir at the windows command prompt.

In Python 3.x strings default to Unicode.  Unless you choose an encoding, 
Python will use the default system encoding to encode the Unicode strings 
into a file.  On Windows, the filesystem uses Unicode and supports the full 
character set, but cp1252 (on your system) is the default text file 
encoding, which doesn't support zero-width space.  Specify an encoding for 
the output file such as UTF-8:

>>> f=open('blah.txt','w',encoding='utf8')
>>> f.write('\u200b')
1
>>> f.close()

-Mark





More information about the Python-list mailing list