os.walk and os.listdir problems python 3.0+

Amos Anderson amosanderson at gmail.com
Thu Jun 25 10:15:15 EDT 2009


Thank you. That works very well when writing to a text file but what is the
equivalent when writing the information to stdout using print?

Sorry when I originally replied I sent it directly and it didn't go to the
list.

On Thu, Jun 25, 2009 at 12:57 AM, Mark Tolonen
<metolone+gmane at gmail.com<metolone%2Bgmane at gmail.com>
> wrote:

>
> "Amos Anderson" <amosanderson at gmail.com> wrote in message
> news:a073a9cf0906242007k5067314dn8e9d7b1c6da6286a at mail.gmail.com...
>
>  I've run into a bit of an issue iterating through files in python 3.0 and
>> 3.1rc2. When it comes to a files with '\u200b' in the file name it gives
>> the
>> error...
>>
>> Traceback (most recent call last):
>>  File "ListFiles.py", line 19, in <module>
>>   f.write("file:{0}\n".format(i))
>>  File "c:\Python31\lib\encodings\cp1252.py", line 19, in encode
>>   return codecs.charmap_encode(input,self.errors,encoding_table)[0]
>> UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in
>> position
>> 30: character maps to <undefined>
>>
>> Code is as follows...
>> import os
>> f = open("dirlist.txt", 'w')
>>
>> for root, dirs, files in os.walk("C:\\Users\\Filter\\"):
>>   f.write("root:{0}\n".format(root))
>>   f.write("dirs:\n")
>>   for i in dirs:
>>       f.write("dir:{0}\n".format(i))
>>   f.write("files:\n")
>>   for i in files:
>>       f.write("file:{0}\n".format(i))
>> f.close()
>> input("done")
>>
>> The file it's choking on happens to be a link that internet explorer
>> created. There are two files that appear in explorer to have the same name
>> but one actually has a zero width space ('\u200b') just before the .url
>> extension. In playing around with this I've found several files with the
>> same character throughout my file system. OS: Vista SP2, Language: US
>> English.
>>
>> Am I doing something wrong or did I find a bug? It's worth noting that
>> Python 2.6 just displays this character as a ? just as it appears if you
>> type dir at the windows command prompt.
>>
>
> In Python 3.x strings default to Unicode.  Unless you choose an encoding,
> Python will use the default system encoding to encode the Unicode strings
> into a file.  On Windows, the filesystem uses Unicode and supports the full
> character set, but cp1252 (on your system) is the default text file
> encoding, which doesn't support zero-width space.  Specify an encoding for
> the output file such as UTF-8:
>
>  f=open('blah.txt','w',encoding='utf8')
>>>> f.write('\u200b')
>>>>
>>> 1
>
>> f.close()
>>>>
>>>
> -Mark
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090625/4eb994dc/attachment-0001.html>


More information about the Python-list mailing list