Guido van Rossum <guido <at> python.org> writes:
It's not hard, it just means a lot of duplicated code if the library wants to support both str and bytes in an optimized way as Martin alluded to. This duplicated code already exists in the C parts to support the 2.x semantics of accepting unicode objects as well as str, but not in the Python parts, which explains why the bytes support is broken in py3k - in 2.x, the same Python code can be used for str and unicode.
On the other hand, supporting it without going after the last percents of performance should be fairly trivial (by encoding/decoding before doing the processing proper), and it would avoid the current duplicated code.
As for reading/writing bytes over the wire, JSON is often used in the same context as HTML: you are supposed to know the charset and decode/encode the payload using that charset. However, the RFC specifies a default encoding of utf-8. (*)
The RFC also specifies a discrimination algorithm for non-supersets of ASCII (“Since the first two characters of a JSON text will always be ASCII characters [RFC0020], it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.”), but it is not implemented in the json module:
json.loads('"hi"') 'hi' json.loads(u'"hi"'.encode('utf16')) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/antoine/cpython/__svn__/Lib/json/__init__.py", line 310, in loads return _default_decoder.decode(s) File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 344, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/antoine/cpython/__svn__/Lib/json/decoder.py", line 362, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded