python 3.1 unicode question
jeffunit
jeff at jeffunit.com
Wed Sep 16 00:48:10 EDT 2009
At 09:25 PM 9/15/2009, Mark Tolonen wrote:
>"jeffunit" <jeff at jeffunit.com> wrote in message
>news:20090915144123964.LJKA6569 at cdptpa-omta01.mail.rr.com...
>>I wrote a program that diffs files and prints out matching file names.
>>I will be executing the output with sh, to delete select files.
>>
>>Most of the files names are plain ascii, but about 10% of them have unicode
>>characters in them. When I try to print the string containing the name, I get
>>an exception:
>>
>>'ascii' codec can't encode character '\udce9'
>>in position 37: ordinal not in range(128)
>>
>>The string is:
>>
>>'./Julio_Iglesias-Un_Hombre_Solo-05-Qu\udce9_no_se_rompa_la_noche.mp3'
>>
>>This is on a windows xp system, using python 3.1 which I compiled
>>with the cygwin
>>linux compatability layer tool.
>>
>>Can you tell me what encoding I need to print \udce9 and how to set python to
>>that encoding mode?
>
>That looks like a "surrogate escape" (See PEP 383)
>http://www.python.org/dev/peps/pep-0383/. It indicates the wrong
>encoding was used to decode the filename.
That seems likely. How do I set the encoding to something correct to
decode the filename?
Clearly windows knows how to display it.
I suspect since I complied python with cygwin, that it is using a
POSIX standard,
rather than a windows specific standard. Of course ideally, I would
like my code to work
on linux as well as windows, as I back up all of my data to a linux
machine with
samba.
thanks,
jeff
More information about the Python-list
mailing list