etree, minidom unicode

n00b pyn00b at
Fri Dec 5 18:46:50 CET 2008


i have a feew questions concnering unicode and utf-8 handling and
would appreciate any insights.

1) i got a xml document, utf-8, encoded and been trying to use etree
to parse and then commit to mysql db. using etree, everything i've
been extracting is return as a string except ascii char > 127, which
come back as a unicode.  using minidom on the same document, however,
i get all unicode. is there a way to 'force' etree to use unicode?

2) i'm using mysql 5.x on * nix (mac, linux) and after much messing
around, have things
working, i.e. i have unicode from the (minidom) parser, set all mysql
and mysqldb attributes, i get <str> back from mysql. is that expected
behavior? #!/usr/bin/env python
# -*- coding: UTF-8 -*-
from xml.dom import minidom
import MySQLdb
import codecs
from onix_model_01 import *

db = MySQLdb.connect(host='localhost', user='root', passwd='',
db='lsi', charset='utf8')
cur = db.cursor()
#cur.execute('SET NAMES utf8')
#cur.execute('SET CHARACTER SET utf8')
cur.execute('SET character_set_connection=utf8')
cur.execute('SET character_set_server=utf8')
cur.execute('''SHOW VARIABLES LIKE 'char%'; ''')
>>> print 'firstname, lastname types from xml: ', type(a.firstname), type(a.lastname)
>>>firstname, lastname types from xml:  <type 'unicode'> <type 'unicode'>
>>>cur.execute('''INSERT INTO encoding_test VALUES(null, %s, %s)''', (a.firstname, a.lastname))

... now i'm getting the results back from mysql

>>>cur.execute('SELECT * FROM encoding_test')
>>>query = cur.fetchall()
>>>for q in query:
    ....print q, type(q[0]), type(q[1]), type(q[2])
    ....print q[1], q[2]
    ....print repr(q[1]), repr(q[2])

>>>(24L, 'Bront\xc3\xab', 'Charlotte ') <type 'long'> <type 'str'> <type 'str'>
>>> Brontë Charlotte
>>>'Bront\xc3\xab' 'Charlotte '

so everything is coming back as it should, but i though i would get
the sql results back as unicode not str ... what gives?

finally, from a utf-8 perspective, is there any advantage using innodb
over myisam?


More information about the Python-list mailing list