Unicode & mx.ODBC module

Chuck Bearden cbearden at hal-pc.org
Thu Mar 4 00:28:33 CET 2004


I'm having a tough time understanding how to manage Unicode when loading
data into an MS SQL server.  I'm still pretty new to Unicode, but I
think I have a grasp of the basic concepts.  I'm running ActivePython
2.3.2 Build 230 on Windows XP.  I have the Egenix mx.ODBC package
version 2.0.1 (thanks, Marc-Andre).

I have a script that is loading the contents of selected HTML files into
a database, along with information identifying the file.  Here is a
sample script:

-------------------------begin snippet-------------------------
import sys
import mx.ODBC.Windows

#-- initialize the db connection
dbname = 'theDb'
uname = 'theUser'
password = 'thePassword'
dsn = "DSN=%s;UID=%s;PWD=%s" % (dbname, uname, password)
con = mx.ODBC.Windows.DriverConnect(dsn)

#-- handle UTF-8 encoded Unicode; this worked when loading XML files
con.encoding = 'utf-8'
con.stringformat = mx.ODBC.Windows.UNICODE_STRINGFORMAT

cur = con.cursor()

#-- get the contents of our file (crudely: filename is 2nd arg)
html_f = open(sys.argv[1], 'r')
htmldata = html_f.read()
html_f.close()

#-- make statement string and insert values tuple, and execute
stmnt = """
  INSERT INTO pmLinkHTML
  (PMID, Ord, HTML, HTMLlen)
  VALUES
  (?, ?, ?, ?)
"""
val_t = (549, 0, htmldata, len(htmldata))
cur.execute(stmnt, val_t)

cur.close()
con.close()
--------------------------end snippet--------------------------

For my pains I am rewarded with:

  Traceback (most recent call last):
    File "./unitest.py", line 27, in ?
      cur.execute(stmnt, val_t)
  UnicodeDecodeError: 'utf8' codec can't decode byte 0xbe in position 
  45662: unexpected code byte

Byte 45662 of the HTML file is indeed "\xBE".  I don't think that should
be a problem.

What am I doing wrong?  I have spent a fair bit of time googling the 
ng in various ways, and consulting Python in a Nutshell and the online 
standard library docs at python.org.  It may be something quite 
obvious to a better-informed coder, but I am prepared to learn.

Many thanks in advance.
Chuck Bearden
 




More information about the Python-list mailing list