[Python-Dev] Dropping bytes "support" in json

Antoine Pitrou solipsis at pitrou.net
Thu Apr 9 07:15:09 CEST 2009


Guido van Rossum <guido <at> python.org> writes:
> 
> I'm kind of surprised that a serialization protocol like JSON wouldn't
> support reading/writing bytes (as the serialized format -- I don't
> care about having bytes as values, since JavaScript doesn't have
> something equivalent AFAIK, and hence JSON doesn't allow it IIRC).
> Marshal and Pickle, for example, *always* treat the serialized format
> as bytes. And since in most cases it will be sent over a socket, at
> some point the serialized representation *will* be bytes, I presume.
> What makes supporting this hard?

It's not hard, it just means a lot of duplicated code if the library wants to
support both str and bytes in an optimized way as Martin alluded to. This
duplicated code already exists in the C parts to support the 2.x semantics of
accepting unicode objects as well as str, but not in the Python parts, which
explains why the bytes support is broken in py3k - in 2.x, the same Python code
can be used for str and unicode.

On the other hand, supporting it without going after the last percents of
performance should be fairly trivial (by encoding/decoding before doing the
processing proper), and it would avoid the current duplicated code.

As for reading/writing bytes over the wire, JSON is often used in the same
context as HTML: you are supposed to know the charset and decode/encode the
payload using that charset. However, the RFC specifies a default encoding of
utf-8. (*)


(*) http://www.ietf.org/rfc/rfc4627.txt

The RFC also specifies a discrimination algorithm for non-supersets of ASCII
(“Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.”), but it is not
implemented in the json module:

>>> json.loads('"hi"')
'hi'
>>> json.loads(u'"hi"'.encode('utf16'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/antoine/cpython/__svn__/Lib/json/__init__.py", line 310, in loads
    return _default_decoder.decode(s)
  File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 344, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 362, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

Regards

Antoine.




More information about the Python-Dev mailing list