MySQLdb not playing nice with unicode
roy at panix.com
Sat Mar 30 17:41:19 CET 2013
In article <roy-A1971D.12193930032013 at news.panix.com>,
Roy Smith <roy at panix.com> wrote:
> My unicode-fu is a bit weak. Are we looking at a Python problem, a
> MySQLdb problem, or a problem with the underlying MySQL server? We've
> certainly inserted utf-8 data before without any problems. It's
> possible this is the first time we've tried to handle a character
> outside the BMP.
Sigh. As is so often the case, I found the answer shortly after posting
It turns out MySQL (at least the version we're running) can't handle
characters outside the BMP!
OK, that leads to the next question. Is there anyway I can (in Python
2.7) detect when a string is not entirely in the BMP? If I could find
all the non-BMP characters, I could replace them with U+FFFD
(REPLACEMENT CHARACTER) and life would be good (enough).
Apparently, newer versions of MySQL have utf8mb4 which can handle this.
On possibility is upgrading to a new MySQL, but if we could just catch
and replace the non-BMP characters during ingestion, that would be a lot
More information about the Python-list