[issue13769] json.dump(ensure_ascii=False) return str instead of unicode
New submission from Марк Коренберг <socketpair@gmail.com>: $ ipython In [1]: type(json.dumps({'a':'b'}, ensure_ascii=False)) Out[1]: <type 'str'> In [2]: type(json.dumps({'a':u'b'}, ensure_ascii=False)) Out[2]: <type 'unicode'> ----------------------- Documentation: If ensure_ascii is False, then the return value will be a unicode instance. -------------------------------- Not applicable to python3 ---------- assignee: docs@python components: Documentation, Library (Lib) messages: 151066 nosy: docs@python, mmarkk priority: normal severity: normal status: open title: json.dump(ensure_ascii=False) return str instead of unicode type: behavior versions: Python 2.6, Python 2.7 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
Changes by Martin v. Löwis <martin@v.loewis.de>: ---------- versions: -Python 2.6 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
Terry J. Reedy <tjreedy@udel.edu> added the comment: Ezio, Raymond: is it the doc that is wrong? ---------- nosy: +ezio.melotti, rhettinger, terry.reedy stage: -> needs patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
Ezio Melotti <ezio.melotti@gmail.com> added the comment: The docstring says: """ If ``ensure_ascii`` is false, then the return value will be a ``unicode`` instance subject to normal Python ``str`` to ``unicode`` coercion rules instead of being escaped to an ASCII ``str``. """ ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
Petri Lehtinen added the comment: It seems to me that when ensure_ascii is False, the return value will be a unicode instance if and only if there's a unicode object anywhere in the input.
json.dumps({'foo': 'bar'}, ensure_ascii=False) '{"foo": "bar"}'
json.dumps({'foo': u'bar'}, ensure_ascii=False) u'{"foo": "bar"}'
json.dumps({'foo': u'äiti'}, ensure_ascii=False) u'{"foo": "\xe4iti"}'
json.dumps({'foo': u'äiti'.encode('utf-8')}, ensure_ascii=False) '{"foo": "\xc3\xa4iti"}'
json.dumps({'foo': u'äiti'.encode('utf-16')}, ensure_ascii=False) '{"foo": "\xff\xfe\xe4\\u0000i\\u0000t\\u0000i\\u0000"}'
---------- nosy: +petri.lehtinen _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
Petri Lehtinen added the comment: It may also be unicode if the encoding parameter is used even if there are no unicode objects in the input.
json.dumps([u'Ş'.encode('iso-8859-9')], encoding='iso-8859-9', ensure_ascii=False) u'["\u015e"]'
---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
Petri Lehtinen added the comment: Attached a patch for 2.7 that updates docs and docstrings. ---------- keywords: +needs review, patch stage: needs patch -> patch review Added file: http://bugs.python.org/file27032/issue13769.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
Changes by Petri Lehtinen <petri@digip.org>: ---------- nosy: +pitrou _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
Petri Lehtinen added the comment: Attached an updated patch, which is more explicit on what ensure_ascii actually does. ---------- Added file: http://bugs.python.org/file27049/issue13769_v2.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
Petri Lehtinen added the comment: Attached yet another patch. It explains what input causes the result to be unicode instead of str. ---------- Added file: http://bugs.python.org/file27064/issue13769_v3.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
Roundup Robot added the comment: New changeset a1884b3027c5 by Petri Lehtinen in branch '2.7': #13769: Enhance docs for ensure_ascii semantics in JSON decoding functions http://hg.python.org/cpython/rev/a1884b3027c5 ---------- nosy: +python-dev _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
Petri Lehtinen added the comment: Fixed, thanks. ---------- keywords: -needs review resolution: -> fixed stage: patch review -> committed/rejected status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
Martijn Pieters added the comment: I'd say this is a bug in the library, not the documentation. The library varies the output type, making it impossible to use `json.dump()` with a `io.open()` object as the library will *mix data type* when writing. That is *terrible* behaviour. ---------- nosy: +mjpieters _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
Terry J. Reedy added the comment: The revised doc admits the problem: "If *ensure_ascii* is False, some chunks written to *fp* may be unicode instances. Unless fp.write() explicitly understands unicode (as in codecs.getwriter) this is likely to cause an error." Making text be unicode in 3.x is our attempt at a generic fix to the problems resulting from the bug-prone 2.x 'text may be bytes or unicode' design. Since continued 2.7 support is aimed at supporting legacy code, we are very reluctant to make behavior changes that could break working code. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13769> _______________________________________
participants (7)
-
Ezio Melotti
-
Martijn Pieters
-
Martin v. Löwis
-
Petri Lehtinen
-
Roundup Robot
-
Terry J. Reedy
-
Марк Коренберг