[Python-Dev] Dropping bytes "support" in json

Damien Diederen dd at crosstwine.com
Mon Apr 27 18:21:15 CEST 2009

Hi Antoine,

Antoine Pitrou <solipsis at pitrou.net> writes:
> Damien Diederen <dd <at> crosstwine.com> writes:
>> I couldn't figure out a way to get rid of it short of multi-#including
>> "templates" and playing with the C preprocessor, however, and have the
>> nagging feeling the latter would be frowned upon by the maintainers.
>> There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm
>> wrong about that.  Should I give it a try, and see how "clean" the
>> result can be made?
> Keep in mind that json is externally maintained by Bob. The more we rework his
> code, the less easy it will be to backport other changes from the simplejson
> library.
> I think we should either keep the code duplication (if we want to keep fast
> paths for both bytes and str objects), or only keep one of the two versions as
> my patch does.

Yes, I was (slowly) reaching the same conclusion.

>> Provided one of the alternatives is dropped, wouldn't it be better to do
>> the opposite, i.e., have the decoder take bytes as input, and the
>> encoder produce bytes—and layer the str functionality on top of that?  I
>> guess the answer depends on how the (most common) lower layers are
>> structured, but it would be nice to allow a straight bytes path to/from
>> the underlying transport.
> The straightest path is actually to/from unicode, since JSON data can contain
> unicode strings but no byte strings. Also, the json library /has/ to output
> unicode when `ensure_ascii` is False. In 2.x:
>>>> json.dumps([u"éléphant"], ensure_ascii=False)
> u'["\xe9l\xe9phant"]'
> In any case, I don't think it will matter much in terms of speed
> whether we take one route or the other. UTF-8 encoding/decoding is
> probably much faster (in characters per second) than JSON
> encoding/decoding is.

You're undoubtedly right.  I was more concerned about the interaction
with other modules, and avoiding unnecessary copies/conversions
especially when they don't make sense from the user's perspective.

I will whip up a patch adding a {loadb,dumpb} API as you suggested in
another email, with the most trivial implementation, and then we'll see
where to go from there.

It can still be dropped if there is a concern of perpetuating a "bad
idea," or I can follow up with a port of Bob's "bytes" implementation
from 2.x if there is any interest.

> Regards
> Antoine.



"Strong Opinions, Weakly Held"
                 -- Bob Johansen

More information about the Python-Dev mailing list