[Python-Dev] Dropping bytes "support" in json

Thu Apr 9 22:19:43 CEST 2009

Alexandre Vassalotti wrote:
> On Thu, Apr 9, 2009 at 1:15 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> As for reading/writing bytes over the wire, JSON is often used in the same
>> context as HTML: you are supposed to know the charset and decode/encode the
>> payload using that charset. However, the RFC specifies a default encoding of
>> utf-8. (*)
>>
>>
>> (*) http://www.ietf.org/rfc/rfc4627.txt
>>
> 
> That is one short and sweet RFC. :-)

It is indeed well-specified. Unfortunately, it only talks about the
application/json type; the pre-existing other versions of json in MIME
types vary widely, such as text/plain (possibly with a charset=
parameter), text/json, or text/javascript. For these, the RFC doesn't
apply.

> Given the RFC specifies that the encoding used should be one of the
> encodings defined by Unicode, wouldn't be a better idea to remove the
> "unicode" support, instead? To me, it would make sense to use the
> detection algorithms for Unicode to sniff the encoding of the JSON
> stream and then use the detected encoding to decode the strings embed
> in the JSON stream.

That might be reasonable. (but then, I also stand by my view that we
shouldn't proceed without Bob's approval).

Regards,
Martin