Re: problems with  character
Bengt Richter
bokr at oz.net
Tue Mar 22 23:28:39 EST 2005
On Tue, 22 Mar 2005 20:09:55 -0600, "John Roth" <newsgroups at jhrothjr.com> wrote:
>I had this problem recently. It turned out that something
>had encoded a unicode string into utf-8. When I found
>the culprit and fixed the underlying design issue, it went away.
>
>John Roth
>
>
>
>"jdonnell" <jaydonnell at gmail.com> wrote in message
>news:1111521139.657563.55410 at o13g2000cwo.googlegroups.com...
>I have a mysql database with characters like   » in it. I'm
>trying to write a python script to remove these, but I'm having a
>really hard time.
>
>These strings are coming out as type 'str' not 'unicode' so I tried to
>just
>
>record[4].replace('Â', '')
>
>but this does nothing. However the following code works
>
>#!/usr/bin/python
>
>s = 'aaaaa  aaa'
>print type(s)
>print s
>print s.find('Â')
>
>This returns
><type 'str'>
>aaaaa  aaa
>6
>
>The other odd thing is that the  character shows up as two spaces if
>I print it to the terminal from mysql, but it shows up as  when I
>print from the simple script above.
>What am I doing wrong?
>
What encodings are involved?
This is from idle on windows, which seems to display latin-1 source ok:
----
>>> "Latin-1:»\n".decode('latin-1')
u'Latin-1:\xc2\xbb\n'
>>> "Latin-1:»\n".decode('latin-1').encode('cp437', 'replace')
'Latin-1:?\xaf\n'
>>> "Latin-1:»\n".decode('latin-1').encode('cp437', 'ignore')
'Latin-1:\xaf\n'
>>> u'Latin-1:\xc2\xbb\n'.encode('cp437','replace')
'Latin-1:?\xaf\n'
>>>
----
Now this is in an NT4 console windows with code page 437:
----
>>> u'Latin-1:\xc2\xbb\n'.encode('cp437','replace')
'Latin-1:?\xaf\n'
>>> import sys
>>> sys.stdout.write(u'Latin-1:\xc2\xbb\n'.encode('cp437','replace'))
Latin-1:?»
----
Notice that the interactive output does a repr that creates the \xaf, but
the character is available and can be written non-repr'd via sys.stdout.write.
For the heck of it:
>>> sys.stdout.write(u'Latin-1:\xc2\xbb\n'.encode('cp437','xmlcharrefreplace'))
Latin-1:»
I don't know if this is going to get through to your screen ;-)
Regards,
Bengt Richter
More information about the Python-list
mailing list