MySQLdb not playing nice with unicode
Roy Smith
roy at panix.com
Sat Mar 30 12:41:19 EDT 2013
In article <roy-A1971D.12193930032013 at news.panix.com>,
Roy Smith <roy at panix.com> wrote:
> My unicode-fu is a bit weak. Are we looking at a Python problem, a
> MySQLdb problem, or a problem with the underlying MySQL server? We've
> certainly inserted utf-8 data before without any problems. It's
> possible this is the first time we've tried to handle a character
> outside the BMP.
Sigh. As is so often the case, I found the answer shortly after posting
this.
http://stackoverflow.com/questions/1890693/
It turns out MySQL (at least the version we're running) can't handle
characters outside the BMP!
OK, that leads to the next question. Is there anyway I can (in Python
2.7) detect when a string is not entirely in the BMP? If I could find
all the non-BMP characters, I could replace them with U+FFFD
(REPLACEMENT CHARACTER) and life would be good (enough).
Apparently, newer versions of MySQL have utf8mb4 which can handle this.
On possibility is upgrading to a new MySQL, but if we could just catch
and replace the non-BMP characters during ingestion, that would be a lot
simpler.
More information about the Python-list
mailing list