Putting Unicode characters in JSON

Chris Angelico rosuav at gmail.com
Fri Mar 23 20:21:09 EDT 2018


On Sat, Mar 24, 2018 at 11:11 AM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> On Fri, 23 Mar 2018 07:46:16 -0700, Tobiah wrote:
>
>> If I changed my database tables to all be UTF-8 would this work cleanly
>> without any decoding?
>
> Not reliably or safely. It will appear to work so long as you have only
> pure ASCII strings from the database, and then crash when you don't:
>
> py> text_from_database = u"hello wörld".encode('latin1')
> py> print text_from_database
> hello w�rld
> py> json.dumps(text_from_database)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/local/lib/python2.7/json/__init__.py", line 231, in dumps
>     return _default_encoder.encode(obj)
>   File "/usr/local/lib/python2.7/json/encoder.py", line 195, in encode
>     return encode_basestring_ascii(o)
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 7:
> invalid start byte
>

If the database has been configured to use UTF-8 (as mentioned, that's
"utf8mb4" in MySQL), you won't get that byte sequence back. You'll get
back valid UTF-8. At least, if ever you don't, that's a MySQL bug, and
not your fault. So yes, it WILL work cleanly. Reliably and safely.

ChrisA



More information about the Python-list mailing list