[Chicago] BSON

Alex Gaynor alex.gaynor at gmail.com
Sun Aug 28 03:28:56 CEST 2011


On Sat, Aug 27, 2011 at 9:23 PM, Christopher Allan Webber <
cwebber at dustycloud.org> wrote:

> I've wondered if the serialization / deserialization might be somewhat
> faster, which could be useful if you're using bson over json in
> simple-message passing in something like a multi-process actor model.  I
> haven't done those benchmarks but I'd love to see the results.
>
> Kumar McMillan <kumar.mcmillan at gmail.com> writes:
>
> > On Wed, Aug 24, 2011 at 11:25 AM, Joshua Herman
> > <zitterbewegung at gmail.com> wrote:
> >> Would anyone be interested in a talk about this ? http://bsonspec.org/
> >> Its a binary version of JSON. The advantages are fast scanability and
> >> easy C representation. It is also schemaless
> >
> > Disadvantages: you can't read it while trying to debug your web page.
> > Seriously, this would be a big deal for me. I'd have to be convinced
> > that the binary format is smaller than a gzipped JSON response. But as
> > Tal said maybe the advantages are more in its data structuring
> > flexibility. Sounds interesting nonetheless, +1 for a talk.
> >
> >>
> >>
> >> ---Profile:---
> >> http://www.google.com/profiles/zitterbewegung
> >> _______________________________________________
> >> Chicago mailing list
> >> Chicago at python.org
> >> http://mail.python.org/mailman/listinfo/chicago
> >>
> > _______________________________________________
> > Chicago mailing list
> > Chicago at python.org
> > http://mail.python.org/mailman/listinfo/chicago
> >
>
> --
> 𝓒𝓱𝓻𝓲𝓼𝓽𝓸𝓹𝓱𝓮𝓻 𝓐𝓵𝓵𝓪𝓷 𝓦𝓮𝓫𝓫𝓮𝓻
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>

If done right (and I haven't looked at it at all) it almost certainly could
be faster (or at least a binary version of JSON could be).  For example you
could do something like at the start of each JSON string, encode the number
of bytes it'll take.  Then a deserializer could just allocate a string of
the right length and memcpy, right now they have to scan the string looking
for the end (taking into account escapes, funny unicode business, and
whatever else) and then do a second pass to actually copy the data.  ints
and floats can be much more efficiently encoded, reading them becomes a
simple reinterpretation of memory, rather than an actual parse.  And I'm
sure there are other ways.

Of course this all assumes it's a sensibly designed binary format, with a
decent deserializer.  Mileage may very, this doesn't constitute legal,
medical, or retirement planning advice.
Alex

-- 
"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20110827/707028e1/attachment.html>


More information about the Chicago mailing list