Re: [Python-Dev] Dropping bytes "support" in json

27 Apr 2009

      Damien Diederen 
 writes:
...
I couldn't figure out a way to get rid of it short of multi-#including
"templates" and playing with the C preprocessor, however, and have the
nagging feeling the latter would be frowned upon by the maintainers.
There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm
wrong about that.  Should I give it a try, and see how "clean" the
result can be made?
Keep in mind that json is externally maintained by Bob. The more we rework his
code, the less easy it will be to backport other changes from the simplejson
library.

I think we should either keep the code duplication (if we want to keep fast
paths for both bytes and str objects), or only keep one of the two versions as
my patch does.
...
Provided one of the alternatives is dropped, wouldn't it be better to do
the opposite, i.e., have the decoder take bytes as input, and the
encoder produce bytes—and layer the str functionality on top of that?  I
guess the answer depends on how the (most common) lower layers are
structured, but it would be nice to allow a straight bytes path to/from
the underlying transport.
The straightest path is actually to/from unicode, since JSON data can contain
unicode strings but no byte strings. Also, the json library /has/ to output
unicode when `ensure_ascii` is False. In 2.x:
...
...
...
json.dumps([u"éléphant"], ensure_ascii=False)
u'["\xe9l\xe9phant"]'
In any case, I don't think it will matter much in terms of speed whether we take
one route or the other. UTF-8 encoding/decoding is probably much faster (in
characters per second) than JSON encoding/decoding is.

Regards

Antoine.

Re: [Python-Dev] Dropping bytes "support" in json

Antoine Pitrou