On 03:21 am, firstname.lastname@example.org wrote:
Barry Warsaw wrote:
I don't know whether the parameter thing will work or not, but you're probably right that we need to get the bytes-everywhere API first.
Given that json is a wire protocol, that sounds like the right approach for json as well. Once bytes-everywhere works, then a text API can be built on top of it, but it is difficult to build a bytes API on top of a text one.
I wish I could agree, but JSON isn't really a wire protocol. According to http://www.ietf.org/rfc/rfc4627.txt JSON is "a text format for the serialization of structured data". There are some notes about encoding, but it is very clearly described in terms of unicode code points.
So I guess the IO library is the right model: bytes at the bottom of the stack, with text as a wrapper around it (mediated by codecs).
In email's case this is true, but in JSON's case it's not. JSON is a format defined as a sequence of code points; MIME is defined as a sequence of octets.
What is the 'bytes support' issue for json? Is it about content within a json text? Or about the transport format of a json text?
Reading rfc4627, a json text is a unicode string representation of an instance of one of 6 classes. In Python terms, they are Nonetype, bool, numbers (int, float, decimal?), (unicode) str, list, and [string-keyed] dict. The representation is nearly identical to Python's literals and displays.
For transport, the encoding SHALL be one of UTF-8, -16LE/BE, -32LE/BD, with UFT-8 the 'default'.
So a json parser (a restricted eval()) tokenizes and parses a stream of unicode chars which in Python could come from either a unicode string or decoded bytes object. The bytes decoding could be either bulk or incremental.
Similarly, a json generator (an repr()-like function) produces a stream of unicode chars which again could be optionally encoded to bytes, either incrementally or in bulk.
The standard does not specify any correspondence between representations and domain objects, For Python making 'null', 'true', and 'false' inter-convert with None, True, False is obvious. Numbers are slightly more problemmtical. A generator could produce decimal literals from both floats and decimals but without a non-json extension, a parser could only convert back to one, so the other would not round-trip. (Int could be handled by the presence or absence of '.0'.) Similarly, tuples could be represented, like lists, as json square-bracketed arrays, but they would be converted back to lists, not tuples, unless a non-json extension were used.
So the two possible byte-suppost content issues I see are how to represent them as legal json strings and/or whether some device should be added to make them round-trip. But as indicated above, these two issues are not unique to bytes.
Terry Jan Reedy