MySQLdb not playing nice with unicode

Roy Smith roy at
Sat Mar 30 17:41:19 CET 2013

> My unicode-fu is a bit weak.  Are we looking at a Python problem, a 
> MySQLdb problem, or a problem with the underlying MySQL server?  We've 
> certainly inserted utf-8 data before without any problems.  It's 
> possible this is the first time we've tried to handle a character 
> outside the BMP.

Sigh.  As is so often the case, I found the answer shortly after posting 

It turns out MySQL (at least the version we're running) can't handle 
characters outside the BMP!

OK, that leads to the next question.  Is there anyway I can (in Python 
2.7) detect when a string is not entirely in the BMP?  If I could find 
all the non-BMP characters, I could replace them with U+FFFD 
(REPLACEMENT CHARACTER) and life would be good (enough).

Apparently, newer versions of MySQL have utf8mb4 which can handle this.  
On possibility is upgrading to a new MySQL, but if we could just catch 
and replace the non-BMP characters during ingestion, that would be a lot 

