[Python-Dev] Dropping bytes "support" in json

Bob Ippolito bob at redivi.com
Mon Apr 13 22:28:26 CEST 2009


On Mon, Apr 13, 2009 at 1:02 PM, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> Yes, there's a TCP connection.  Sorry for not making that clear to begin
>> with.
>>
>>     If so, it doesn't matter what representation these implementations chose
>>     to use.
>>
>>
>> True, I can always convert from bytes to str or vise versa.
>
> I think you are missing the point. It will not be necessary to convert.
> You can write the JSON into the TCP connection in Python, and it will
> come out just fine as strings just fine in C# and JavaScript. This
> is how middleware works - it abstracts from programming languages, and
> allows for different representations in different languages, in a
> manner invisible to the participating processes.
>
>> At least one of these two needs to work:
>>
>> json.dumps({}).encode('utf-16le')  # dumps() returns str
>> '{\x00}\x00'
>>
>> json.dumps({}, encoding='utf-16le')  # dumps() returns bytes
>> '{\x00}\x00'
>>
>> In 2.6, the first one works.  The second incorrectly returns '{}'.
>
> Ok, that might be a bug in the JSON implementation - but you shouldn't
> be using utf-16le, anyway. Use UTF-8 always, and it will work fine.
>
> The questions is: which of them is more appropriate, if, what you want,
> is bytes. I argue that the second form is better, since it saves you
> an encode invocation.

It's not a bug in dumps, it's a matter of not reading the
documentation. The encoding parameter of dumps decides how byte
strings should be interpreted, not what the output encoding is.

The output of json/simplejson dumps for Python 2.x is either an ASCII
bytestring (default) or a unicode string (when ensure_ascii=False).
This is very practical in 2.x because an ASCII bytestring can be
treated as either text or bytes in most situations, isn't going to get
mangled over any kind of encoding mismatch (as long as it's an ASCII
superset), and skips an encoding step if getting sent over the wire..

>>> simplejson.dumps(['\x00f\x00o\x00o'], encoding='utf-16be')
'["foo"]'
>>> simplejson.dumps(['\x00f\x00o\x00o'], encoding='utf-16be', ensure_ascii=False)
u'["foo"]'

-bob


More information about the Python-Dev mailing list