Re: problems with  character

Bengt Richter bokr at
Wed Mar 23 05:28:39 CET 2005

On Tue, 22 Mar 2005 20:09:55 -0600, "John Roth" <newsgroups at> wrote:

>I had this problem recently. It turned out that something
>had encoded a unicode string into utf-8. When I found
>the culprit and fixed the underlying design issue, it went away.
>John Roth
>"jdonnell" <jaydonnell at> wrote in message 
>news:1111521139.657563.55410 at
>I have a mysql database with characters like      » in it. I'm
>trying to write a python script to remove these, but I'm having a
>really hard time.
>These strings are coming out as type 'str' not 'unicode' so I tried to
>record[4].replace('Â', '')
>but this does nothing. However the following code works
>s = 'aaaaa  aaa'
>print type(s)
>print s
>print s.find('Â')
>This returns
><type 'str'>
>aaaaa  aaa
>The other odd thing is that the  character shows up as two spaces if
>I print it to the terminal from mysql, but it shows up as  when I
>print from the simple script above.
>What am I doing wrong?
What encodings are involved? 

This is from idle on windows, which seems to display latin-1 source ok:
 >>> "Latin-1:»\n".decode('latin-1')
 >>> "Latin-1:»\n".decode('latin-1').encode('cp437', 'replace')
 >>> "Latin-1:»\n".decode('latin-1').encode('cp437', 'ignore')
 >>> u'Latin-1:\xc2\xbb\n'.encode('cp437','replace')
Now this is in an NT4 console windows with code page 437:

 >>> u'Latin-1:\xc2\xbb\n'.encode('cp437','replace')
 >>> import sys
 >>> sys.stdout.write(u'Latin-1:\xc2\xbb\n'.encode('cp437','replace'))

Notice that the interactive output does a repr that creates the \xaf, but
the character is available and can be written non-repr'd via sys.stdout.write.

For the heck of it:

 >>> sys.stdout.write(u'Latin-1:\xc2\xbb\n'.encode('cp437','xmlcharrefreplace'))

I don't know if this is going to get through to your screen ;-)

Bengt Richter

More information about the Python-list mailing list