Characters aren't displayed correctly

Hussein B hubaghdadi at gmail.com
Tue Mar 3 12:22:07 CET 2009


On Mar 3, 12:21 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Mar 3, 8:49 pm, Hussein B <hubaghd... at gmail.com> wrote:
>
>
>
> > On Mar 3, 11:05 am, Hussein B <hubaghd... at gmail.com> wrote:
>
> > > On Mar 2, 5:40 pm, John Machin <sjmac... at lexicon.net> wrote:
>
> > > > On Mar 3, 1:50 am, Hussein B <hubaghd... at gmail.com> wrote:
>
> > > > > On Mar 2, 4:31 pm, John Machin <sjmac... at lexicon.net> wrote:> On Mar 2, 7:30 pm, Hussein B <hubaghd... at gmail.com> wrote:
>
> > > > > > > On Mar 1, 4:51 pm, Philip Semanchuk <phi... at semanchuk.com> wrote:
>
> > > > > > > > On Mar 1, 2009, at 8:31 AM, Hussein B wrote:
>
> > > > > > > > > Hey,
> > > > > > > > > I'm retrieving records from MySQL database that contains non english
> > > > > > > > > characters.
>
> > > > > > Can you reveal which language???
>
> > > > > Arabic
>
> > > > > > > > > Then I create a String that contains HTML markup and column values
> > > > > > > > > from the previous result set.
> > > > > > > > > +++++
> > > > > > > > > markup = u'''<table>.....'''
> > > > > > > > > for row in rows:
> > > > > > > > >     markup = markup + '<tr><td>' + row['id']
> > > > > > > > > markup = markup + '</table>
> > > > > > > > > +++++
> > > > > > > > > Then I'm sending the email according to this tip:
> > > > > > > > >http://code.activestate.com/recipes/473810/
> > > > > > > > > Well, the email contains ????? characters for each non english ones.
> > > > > > > > > Any ideas?
>
> > > > > > > > There's so many places where this could go wrong and you haven't  
> > > > > > > > narrowed down the problem.
>
> > > > > > > > Are the characters stored in the database correctly?
>
> > > > > > > Yes they are.
>
> > > > > > How do you KNOW that they are stored correctly? What makes you so
> > > > > > sure?
>
> > > > > Because MySQL Query Browser displays them correctly, in addition I use
> > > > > BIRT as the reporting system and it shows them correctly.
>
> > > > > > > > Are they stored consistently (i.e. all using the same encoding, not  
> > > > > > > > some using utf-8 and others using iso-8859-1)?
>
> > > > > > > Yes.
>
> > > > > > So what is the encoding used to store them?
>
> > > > > Tables are created with UTF-8 encoding option
>
> > > > > > > > What are you getting out of the database? Is it being converted to  
> > > > > > > > Unicode correctly, or at all?
>
> > > > > > > I don't know, how to make sure of this point?
>
> > > > > > You could show us some of the output from the database query. As well
> > > > > > as
> > > > > >    print the_output
> > > > > > you should
> > > > > >    print repr(the_output)
> > > > > > and show us both, and also tell us what you *expect* to see.
>
> > > > > The result of print repr(row['name']) is '??? ??????'
> > > > > The '?' characters are supposed to be Arabic characters.
>
> > > > Are you expecting 3 Arabic characters, a space, and then 6 Arabic
> > > > characters?
>
> > > > We now have some interesting evidence: row['name'] is NOT a unicode
> > > > object -- otherwise the print would show u'??? ??????'; it's a str
> > > > object.
>
> > > > So: A utf8-encoded string is being decoded to unicode, and then re-
> > > > encoded to some other encoding, using the "replace" (with "?") error-
> > > > handling method. That shouldn't be hard to spot! It's about time you
> > > > showed us the code you are using to extract the data from the
> > > > database, including the print statements you have put in.
>
> > > This is how I retrieve the data:
>
> > > db = MySQLdb.connect(host = "127.0.0.1", port = 3306, user =
> > > "username",
> > >                          passwd = "passwd", db = "reporting")
> > > cr = db.cursor(MySQLdb.cursors.DictCursor)
> > > cr.execute(sql)
> > > rows = cr.fetchall()
>
> > > Thanks all for your nice help.
>
> > Hey,
> > I added use_unicode and charset keyword params to the connect() method
>
> Hey, that was a brilliant idea -- I was just about to ask you to try
>  use_unicode=True, charset="utf8" ... what were the actual values that
> you used?

I didn't supply values for them the first times.

> Let's suppose that you used charset="XXXX" ... as far as I can tell,
> not being a mysqldb user myself, this means that your data tables and/
> or your default connection don't use XXXX as an encoding. If so, this
> might be an issue you might like to take up with whoever created the
> database that you are using.
>
> > and I got the following:
> > u'\u062f\u062e\u0648\u0644 \u0633\u0631\u064a\u0639
> > \u0634\u0647\u0631'
> > So characters are getting converted successfully.
>
> I guess so -- U+06nn sure are Arabic characters :-)
>
> However as suggested above, "converted from what?" might be worth
> pursuing if you like to understand what is going on instead of just
> applying magic recipes ;-)
>
> > Well, using the previous recipe for sending the mail:http://code.activestate.com/recipes/473810/
> > I got the following error:
>
> > Traceback (most recent call last):
> >   File "HtmlMail.py", line 52, in <module>
> >     s.sendmail(sender, receiver , msg.as_string())
>
> [big snip]
>
> > _handle_text
> >     self._fp.write(payload)
> > UnicodeEncodeError: 'ascii' codec can't encode characters in position
> > 115-118: ordinal not in range(128)
>
> > Again, any ideas guys? :)
>
> That recipe appears to have been written by an ascii bigot for ascii
> bigots :-(
>
> Try reading the docs for email.charset (that's the charset module in
> the email package).

Every thing is working now, I did the following:
t = MIMEText(markup.encode('utf-8'), 'html', 'utf-8')

> Cheers,
> John

Thank you all guys and especially you John, I owe you a HUGE bottle of
beer :D



More information about the Python-list mailing list