Characters aren't displayed correctly

Hussein B hubaghdadi at gmail.com
Tue Mar 3 04:05:31 EST 2009


On Mar 2, 5:40 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Mar 3, 1:50 am, Hussein B <hubaghd... at gmail.com> wrote:
>
>
>
> > On Mar 2, 4:31 pm, John Machin <sjmac... at lexicon.net> wrote:> On Mar 2, 7:30 pm, Hussein B <hubaghd... at gmail.com> wrote:
>
> > > > On Mar 1, 4:51 pm, Philip Semanchuk <phi... at semanchuk.com> wrote:
>
> > > > > On Mar 1, 2009, at 8:31 AM, Hussein B wrote:
>
> > > > > > Hey,
> > > > > > I'm retrieving records from MySQL database that contains non english
> > > > > > characters.
>
> > > Can you reveal which language???
>
> > Arabic
>
> > > > > > Then I create a String that contains HTML markup and column values
> > > > > > from the previous result set.
> > > > > > +++++
> > > > > > markup = u'''<table>.....'''
> > > > > > for row in rows:
> > > > > >     markup = markup + '<tr><td>' + row['id']
> > > > > > markup = markup + '</table>
> > > > > > +++++
> > > > > > Then I'm sending the email according to this tip:
> > > > > >http://code.activestate.com/recipes/473810/
> > > > > > Well, the email contains ????? characters for each non english ones.
> > > > > > Any ideas?
>
> > > > > There's so many places where this could go wrong and you haven't  
> > > > > narrowed down the problem.
>
> > > > > Are the characters stored in the database correctly?
>
> > > > Yes they are.
>
> > > How do you KNOW that they are stored correctly? What makes you so
> > > sure?
>
> > Because MySQL Query Browser displays them correctly, in addition I use
> > BIRT as the reporting system and it shows them correctly.
>
> > > > > Are they stored consistently (i.e. all using the same encoding, not  
> > > > > some using utf-8 and others using iso-8859-1)?
>
> > > > Yes.
>
> > > So what is the encoding used to store them?
>
> > Tables are created with UTF-8 encoding option
>
> > > > > What are you getting out of the database? Is it being converted to  
> > > > > Unicode correctly, or at all?
>
> > > > I don't know, how to make sure of this point?
>
> > > You could show us some of the output from the database query. As well
> > > as
> > >    print the_output
> > > you should
> > >    print repr(the_output)
> > > and show us both, and also tell us what you *expect* to see.
>
> > The result of print repr(row['name']) is '??? ??????'
> > The '?' characters are supposed to be Arabic characters.
>
> Are you expecting 3 Arabic characters, a space, and then 6 Arabic
> characters?
>
> We now have some interesting evidence: row['name'] is NOT a unicode
> object -- otherwise the print would show u'??? ??????'; it's a str
> object.
>
> So: A utf8-encoded string is being decoded to unicode, and then re-
> encoded to some other encoding, using the "replace" (with "?") error-
> handling method. That shouldn't be hard to spot! It's about time you
> showed us the code you are using to extract the data from the
> database, including the print statements you have put in.

This is how I retrieve the data:

db = MySQLdb.connect(host = "127.0.0.1", port = 3306, user =
"username",
                         passwd = "passwd", db = "reporting")
cr = db.cursor(MySQLdb.cursors.DictCursor)
cr.execute(sql)
rows = cr.fetchall()

Thanks all for your nice help.



More information about the Python-list mailing list