MySQLdb not playing nice with unicode

Roy Smith roy at panix.com
Sat Mar 30 17:41:19 CET 2013


In article <roy-A1971D.12193930032013 at news.panix.com>,
 Roy Smith <roy at panix.com> wrote:

> My unicode-fu is a bit weak.  Are we looking at a Python problem, a 
> MySQLdb problem, or a problem with the underlying MySQL server?  We've 
> certainly inserted utf-8 data before without any problems.  It's 
> possible this is the first time we've tried to handle a character 
> outside the BMP.

Sigh.  As is so often the case, I found the answer shortly after posting 
this.

http://stackoverflow.com/questions/1890693/

It turns out MySQL (at least the version we're running) can't handle 
characters outside the BMP!

OK, that leads to the next question.  Is there anyway I can (in Python 
2.7) detect when a string is not entirely in the BMP?  If I could find 
all the non-BMP characters, I could replace them with U+FFFD 
(REPLACEMENT CHARACTER) and life would be good (enough).

Apparently, newer versions of MySQL have utf8mb4 which can handle this.  
On possibility is upgrading to a new MySQL, but if we could just catch 
and replace the non-BMP characters during ingestion, that would be a lot 
simpler.



More information about the Python-list mailing list