[Python-Dev] Dropping bytes "support" in json
solipsis at pitrou.net
Thu Apr 9 07:15:09 CEST 2009
Guido van Rossum <guido <at> python.org> writes:
> I'm kind of surprised that a serialization protocol like JSON wouldn't
> support reading/writing bytes (as the serialized format -- I don't
> something equivalent AFAIK, and hence JSON doesn't allow it IIRC).
> Marshal and Pickle, for example, *always* treat the serialized format
> as bytes. And since in most cases it will be sent over a socket, at
> some point the serialized representation *will* be bytes, I presume.
> What makes supporting this hard?
It's not hard, it just means a lot of duplicated code if the library wants to
support both str and bytes in an optimized way as Martin alluded to. This
duplicated code already exists in the C parts to support the 2.x semantics of
accepting unicode objects as well as str, but not in the Python parts, which
explains why the bytes support is broken in py3k - in 2.x, the same Python code
can be used for str and unicode.
On the other hand, supporting it without going after the last percents of
performance should be fairly trivial (by encoding/decoding before doing the
processing proper), and it would avoid the current duplicated code.
As for reading/writing bytes over the wire, JSON is often used in the same
context as HTML: you are supposed to know the charset and decode/encode the
payload using that charset. However, the RFC specifies a default encoding of
The RFC also specifies a discrimination algorithm for non-supersets of ASCII
(“Since the first two characters of a JSON text will always be ASCII
characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.”), but it is not
implemented in the json module:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/antoine/cpython/__svn__/Lib/json/__init__.py", line 310, in loads
File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 344, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 362, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
More information about the Python-Dev