os.walk and os.listdir problems python 3.0+
amosanderson at gmail.com
Thu Jun 25 10:15:15 EDT 2009
Thank you. That works very well when writing to a text file but what is the
equivalent when writing the information to stdout using print?
Sorry when I originally replied I sent it directly and it didn't go to the
On Thu, Jun 25, 2009 at 12:57 AM, Mark Tolonen
<metolone+gmane at gmail.com<metolone%2Bgmane at gmail.com>
> "Amos Anderson" <amosanderson at gmail.com> wrote in message
> news:a073a9cf0906242007k5067314dn8e9d7b1c6da6286a at mail.gmail.com...
> I've run into a bit of an issue iterating through files in python 3.0 and
>> 3.1rc2. When it comes to a files with '\u200b' in the file name it gives
>> Traceback (most recent call last):
>> File "ListFiles.py", line 19, in <module>
>> File "c:\Python31\lib\encodings\cp1252.py", line 19, in encode
>> return codecs.charmap_encode(input,self.errors,encoding_table)
>> UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in
>> 30: character maps to <undefined>
>> Code is as follows...
>> import os
>> f = open("dirlist.txt", 'w')
>> for root, dirs, files in os.walk("C:\\Users\\Filter\\"):
>> for i in dirs:
>> for i in files:
>> The file it's choking on happens to be a link that internet explorer
>> created. There are two files that appear in explorer to have the same name
>> but one actually has a zero width space ('\u200b') just before the .url
>> extension. In playing around with this I've found several files with the
>> same character throughout my file system. OS: Vista SP2, Language: US
>> Am I doing something wrong or did I find a bug? It's worth noting that
>> Python 2.6 just displays this character as a ? just as it appears if you
>> type dir at the windows command prompt.
> In Python 3.x strings default to Unicode. Unless you choose an encoding,
> Python will use the default system encoding to encode the Unicode strings
> into a file. On Windows, the filesystem uses Unicode and supports the full
> character set, but cp1252 (on your system) is the default text file
> encoding, which doesn't support zero-width space. Specify an encoding for
> the output file such as UTF-8:
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-list