Putting Unicode characters in JSON
Thomas Jollans
tjol at tjol.eu
Thu Mar 22 20:27:46 EDT 2018
On 22/03/18 20:46, Tobiah wrote:
> I was reading though, that JSON files must be encoded with UTF-8. So
> should I be doing string.decode('latin-1').encode('utf-8')? Or does
> the json module do that for me when I give it a unicode object?
Definitely not. In fact, that won't even work.
>>> import json
>>> s = 'déjà vu'.encode('latin1')
>>> s
b'd\xe9j\xe0 vu'
>>> json.dumps(s.decode('latin1').encode('utf8'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python3.6/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.6/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python3.6/json/encoder.py", line 180, in default
o.__class__.__name__)
TypeError: Object of type 'bytes' is not JSON serializable
>>>
You should make sure that either the file you're writing to is opened as
UTF-8 text, or the ensure_ascii parameter of dumps() or dump() is set to
True (the default) – and then write the data in ASCII or any
ASCII-compatible encoding (e.g. UTF-8).
Basically, the default behaviour of the json module means you don't
really have to worry about encodings at all once your original data is
in unicode strings.
-- Thomas
More information about the Python-list
mailing list