[Python-Dev] Dropping bytes "support" in json

Mon Apr 27 17:24:29 CEST 2009

Damien Diederen <dd <at> crosstwine.com> writes:
> 
> I couldn't figure out a way to get rid of it short of multi-#including
> "templates" and playing with the C preprocessor, however, and have the
> nagging feeling the latter would be frowned upon by the maintainers.
> 
> There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm
> wrong about that.  Should I give it a try, and see how "clean" the
> result can be made?

Keep in mind that json is externally maintained by Bob. The more we rework his
code, the less easy it will be to backport other changes from the simplejson
library.

I think we should either keep the code duplication (if we want to keep fast
paths for both bytes and str objects), or only keep one of the two versions as
my patch does.

> Provided one of the alternatives is dropped, wouldn't it be better to do
> the opposite, i.e., have the decoder take bytes as input, and the
> encoder produce bytes—and layer the str functionality on top of that?  I
> guess the answer depends on how the (most common) lower layers are
> structured, but it would be nice to allow a straight bytes path to/from
> the underlying transport.

The straightest path is actually to/from unicode, since JSON data can contain
unicode strings but no byte strings. Also, the json library /has/ to output
unicode when `ensure_ascii` is False. In 2.x:

>>> json.dumps([u"éléphant"], ensure_ascii=False)
u'["\xe9l\xe9phant"]'

In any case, I don't think it will matter much in terms of speed whether we take
one route or the other. UTF-8 encoding/decoding is probably much faster (in
characters per second) than JSON encoding/decoding is.

Regards

Antoine.