[Python-Dev] Dropping bytes "support" in json

Alexandre Vassalotti alexandre at peadrop.com
Thu Apr 9 21:51:15 CEST 2009

On Thu, Apr 9, 2009 at 1:15 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> As for reading/writing bytes over the wire, JSON is often used in the same
> context as HTML: you are supposed to know the charset and decode/encode the
> payload using that charset. However, the RFC specifies a default encoding of
> utf-8. (*)
> (*) http://www.ietf.org/rfc/rfc4627.txt

That is one short and sweet RFC. :-)

> The RFC also specifies a discrimination algorithm for non-supersets of ASCII
> (“Since the first two characters of a JSON text will always be ASCII
>   characters [RFC0020], it is possible to determine whether an octet
>   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
>   at the pattern of nulls in the first four octets.”), but it is not
> implemented in the json module:

Given the RFC specifies that the encoding used should be one of the
encodings defined by Unicode, wouldn't be a better idea to remove the
"unicode" support, instead? To me, it would make sense to use the
detection algorithms for Unicode to sniff the encoding of the JSON
stream and then use the detected encoding to decode the strings embed
in the JSON stream.

-- Alexandre

More information about the Python-Dev mailing list