[pypy-issue] [issue1627] json(ensure_ascii=False)

Tobias Oberstein tracker at bugs.pypy.org
Sun Nov 10 21:58:37 CET 2013


Tobias Oberstein <tobias.oberstein at gmail.com> added the comment:

The JSON produced by Python's `json.dumps` is invalid. It is not valid UTF8. So
reproducing this in PyPy is proliferating that bug.

It is invalid, since `\xc0` is only legal as a continuation octet in certain
multibyte encoded Unicode characters (see the DFA in
http://bjoern.hoehrmann.de/utf-8/decoder/dfa/).

Here is the proof:

$ python
Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from autobahn.utf8validator import Utf8Validator
>>> v = Utf8Validator()
>>> v.validate("hello")
(True, True, 5, 5)
>>> v.reset()
>>> v.validate("\xc0")
(False, False, 0, 0)
>>> import json
>>> json.dumps("\xc0", ensure_ascii = False)
'"\xc0"'
>>> "\xc0".decode("utf8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc0 in position 0: invalid
start byte
>>> json.dumps("hello", ensure_ascii = False)
'"hello"'
>>>

----------
nosy: +oberstet

________________________________________
PyPy bug tracker <tracker at bugs.pypy.org>
<https://bugs.pypy.org/issue1627>
________________________________________


More information about the pypy-issue mailing list