[Python-Dev] Dropping bytes "support" in json

Sun Apr 12 04:29:24 CEST 2009

On 11/04/2009 6:12 PM, Antoine Pitrou wrote:
> Martin v. Löwis<martin<at>  v.loewis.de>  writes:
>> Not sure whether it would be *significantly* faster, but yes, Bob wrote
>> an accelerator for parsing out of a byte string to make it really fast;
>> IIRC, he claims that it is faster than pickling.
>
> Isn't premature optimization the root of all evil?
>
> Besides, the fact that many values in a typical JSON object will be strings, and
> must be encoded from/decoded to unicode objects in py3k, suggests that
> accepting/outputting unicode as default is the laziest (i.e. the best) choice
> performance-wise.

I don't see it as premature optimization, but rather trying to ensure 
the interface/api best suits the actual use cases.

> But you don't have to trust me: look at the quick numbers I've posted. The py3k
> version (in the str-only incarnation I've proposed) is sometimes actually faster
> than the trunk version:
> http://mail.python.org/pipermail/python-dev/2009-April/088498.html

But if all *actual* use-cases involve moving to and from utf8 encoded 
bytes, I'm not sure that little example is particularly useful.  In 
those use-cases, I'd be surprised if there wasn't significant time and 
space benefits in not asking apps to use an 'intermediate' string object 
before getting the bytes they need, particularly when the payload may be 
a significant size.

Assuming the above is all true, I'd see choosing bytes less as a 
premature optimization and more a design choice which best supports 
actual use.  So to my mind the only real question is whether the above 
*is* true, or if there are common use-cases which don't involve 
utf8-off/on-the-wire...

Cheers,

Mark