python 3.1 unicode question

Chris Rebert clp2 at
Wed Sep 16 07:07:40 CEST 2009

On Tue, Sep 15, 2009 at 9:48 PM, jeffunit <jeff at> wrote:
> At 09:25 PM 9/15/2009, Mark Tolonen wrote:
>> "jeffunit" <jeff at> wrote in message
>> news:20090915144123964.LJKA6569 at
>>> I wrote a program that diffs files and prints out matching file names.
>>> I will be executing the output with sh, to delete select files.
>>> Most of the files names are plain ascii, but about 10% of them have
>>> unicode
>>> characters in them. When I try to print the string containing the name, I
>>> get
>>> an exception:
>>> 'ascii' codec can't encode character '\udce9'
>>> in position 37: ordinal not in range(128)
>>> The string is:
>>> './Julio_Iglesias-Un_Hombre_Solo-05-Qu\udce9_no_se_rompa_la_noche.mp3'
>>> This is on a windows xp system, using python 3.1 which I compiled
>>> with the cygwin
>>> linux compatability layer tool.
>>> Can you tell me what encoding I need to print \udce9 and how to set
>>> python to
>>> that encoding mode?
>> That looks like a "surrogate escape" (See PEP 383)
>>  It indicates the wrong encoding
>> was used to decode the filename.
> That seems likely. How do I set the encoding to something correct to decode
> the filename?
> Clearly windows knows how to display it.
> I suspect since I complied python with cygwin, that it is using a POSIX
> standard,
> rather than a windows specific standard. Of course ideally, I would like my
> code to work
> on linux as well as windows, as I back up all of my data to a linux machine
> with
> samba.

Have you perhaps tried using the native Windows version of Python?


More information about the Python-list mailing list