Re: problems with  character

Bengt Richter bokr at oz.net
Wed Mar 23 05:28:39 CET 2005


On Tue, 22 Mar 2005 20:09:55 -0600, "John Roth" <newsgroups at jhrothjr.com> wrote:

>I had this problem recently. It turned out that something
>had encoded a unicode string into utf-8. When I found
>the culprit and fixed the underlying design issue, it went away.
>
>John Roth
>
>
>
>"jdonnell" <jaydonnell at gmail.com> wrote in message 
>news:1111521139.657563.55410 at o13g2000cwo.googlegroups.com...
>I have a mysql database with characters like      » in it. I'm
>trying to write a python script to remove these, but I'm having a
>really hard time.
>
>These strings are coming out as type 'str' not 'unicode' so I tried to
>just
>
>record[4].replace('Â', '')
>
>but this does nothing. However the following code works
>
>#!/usr/bin/python
>
>s = 'aaaaa  aaa'
>print type(s)
>print s
>print s.find('Â')
>
>This returns
><type 'str'>
>aaaaa  aaa
>6
>
>The other odd thing is that the  character shows up as two spaces if
>I print it to the terminal from mysql, but it shows up as  when I
>print from the simple script above.
>What am I doing wrong?
>
What encodings are involved? 

This is from idle on windows, which seems to display latin-1 source ok:
 ----
 >>> "Latin-1:»\n".decode('latin-1')
 u'Latin-1:\xc2\xbb\n'
 >>> "Latin-1:»\n".decode('latin-1').encode('cp437', 'replace')
 'Latin-1:?\xaf\n'
 >>> "Latin-1:»\n".decode('latin-1').encode('cp437', 'ignore')
 'Latin-1:\xaf\n'
 >>> u'Latin-1:\xc2\xbb\n'.encode('cp437','replace')
 'Latin-1:?\xaf\n'
 >>> 
 ----
Now this is in an NT4 console windows with code page 437:

 ----
 >>> u'Latin-1:\xc2\xbb\n'.encode('cp437','replace')
 'Latin-1:?\xaf\n'
 >>> import sys
 >>> sys.stdout.write(u'Latin-1:\xc2\xbb\n'.encode('cp437','replace'))
 Latin-1:?»
 ----

Notice that the interactive output does a repr that creates the \xaf, but
the character is available and can be written non-repr'd via sys.stdout.write.

For the heck of it:

 >>> sys.stdout.write(u'Latin-1:\xc2\xbb\n'.encode('cp437','xmlcharrefreplace'))
 Latin-1:&#194;»

I don't know if this is going to get through to your screen ;-)

Regards,
Bengt Richter



More information about the Python-list mailing list