python 3.1 unicode question

Wed Sep 16 01:07:40 EDT 2009

On Tue, Sep 15, 2009 at 9:48 PM, jeffunit <jeff at jeffunit.com> wrote:
> At 09:25 PM 9/15/2009, Mark Tolonen wrote:
>>
>> "jeffunit" <jeff at jeffunit.com> wrote in message
>> news:20090915144123964.LJKA6569 at cdptpa-omta01.mail.rr.com...
>>>
>>> I wrote a program that diffs files and prints out matching file names.
>>> I will be executing the output with sh, to delete select files.
>>>
>>> Most of the files names are plain ascii, but about 10% of them have
>>> unicode
>>> characters in them. When I try to print the string containing the name, I
>>> get
>>> an exception:
>>>
>>> 'ascii' codec can't encode character '\udce9'
>>> in position 37: ordinal not in range(128)
>>>
>>> The string is:
>>>
>>> './Julio_Iglesias-Un_Hombre_Solo-05-Qu\udce9_no_se_rompa_la_noche.mp3'
>>>
>>> This is on a windows xp system, using python 3.1 which I compiled
>>> with the cygwin
>>> linux compatability layer tool.
>>>
>>> Can you tell me what encoding I need to print \udce9 and how to set
>>> python to
>>> that encoding mode?
>>
>> That looks like a "surrogate escape" (See PEP 383)
>> http://www.python.org/dev/peps/pep-0383/.  It indicates the wrong encoding
>> was used to decode the filename.
>
> That seems likely. How do I set the encoding to something correct to decode
> the filename?
>
> Clearly windows knows how to display it.
> I suspect since I complied python with cygwin, that it is using a POSIX
> standard,
> rather than a windows specific standard. Of course ideally, I would like my
> code to work
> on linux as well as windows, as I back up all of my data to a linux machine
> with
> samba.

Have you perhaps tried using the native Windows version of Python?

Cheers,
Chris
--
http://blog.rebertia.com