Draft PEP on RSON configuration file format

Mon Mar 1 21:19:34 EST 2010

Kirill:

Thank you for your constructive criticism.  This is the gem that made
it worthwhile to post my document.  I think all of your points are
spot-on, and I will be fixing the documentation.

I can well believe that the C implementation of YAML is much faster
than the Python one, but I am aiming for something that will be
reasonably quick in pure Python.  I will double-check the JSON C test
results, but something I probably did not make clear is that the 22
seconds is not spent parsing -- that is for the entire test, which
involves reading restructured text and generating some 160 separate
PDF files.

Best regards,
Pat

On Mon, Mar 1, 2010 at 8:02 PM, Kirill Simonov <xi at gamma.dn.ua> wrote:
> Patrick Maupin wrote:
>>
>> All:
>>
>> Finding .ini configuration files too limiting, JSON and XML to hard to
>> manually edit, and YAML too complex to parse quickly, I have started
>> work on a new configuration file parser.
>
> I'd like to note that with the optional libyaml bindings, the PyYAML parser
> is pretty fast.
>
>> I call the new format RSON (for "Readable Serial Object Notation"),
>> and it is designed to be a superset of JSON.
>>
>> I would love for it to be considered valuable enough to be a part of
>> the standard library, but even if that does not come to pass, I would
>> be very interested in feedback to help me polish the specification,
>> and then possibly help for implementation and testing.
>>
>> The documentation is in rst PEP form, at:
>>
>> http://rson.googlecode.com/svn/trunk/doc/draftpep.txt
>
> === cut ===
> Because YAML does allow for highly readable configuration files, it
> is tempting to overlook its other flaws for the task.  But a fully
> (or almost) compliant parser has to understand the whole YAML
> specification, and this is apparently expensive.  Running the rst2pdf
> testsuite, without sphinx or most of the other optional packages, in
> "fast" mode (preloading all the modules, and then forking for every
> test) generates 161 smallish PDF files, totaling around 2.5 MB.  On
> one test system this process takes 22 seconds.  Disabling the _json C
> scanner and reading the configuration files using the json pure Python
> implementation adds about 0.3 seconds to the 22 seconds.  But using
> pyyaml v. 3.09 instead of json adds 33 seconds to the 22 second process!
> It might seem that this is an edge case, but it makes it unacceptable to
> use YAML for this sort of testing, and taking 200 ms to read in 1000
> lines of simple JSON will be unacceptable in many other application
> domains as well.
> === cut ===
>
> I'd question your testing methodology.  From your description, it looks like
> the _json speedup never was enabled.  Also PyYAML provides optional bindings
> to libyaml, which makes parsing and emitting yaml much faster.  In my tests,
> it parses a 10Mb file in 3 sec.
>
> === cut ===
> RSON semantics are based on JSON.  Like JSON, an RSON document represents
> either a single scalar object, or a DAG (Directed Acyclic Graph), which
> may contain only a few simple data types.
> === cut ===
>
> JSON doesn't represent a DAG, at least, not an arbitrary DAG since each node
> in the document has no more than one parent.  It would be more accurate to
> say that that it represents a tree-like structure.
>
> === cut ===
> The YAML syntax for supporting back-references was considered and deemed
> unsatisfactory. A human user who wants to put identical information in a
> "ship to" and "bill to" address is much more likely to use cut and paste
> than he is to understand and use backreferences, so the additional overhead
> of supporting more complex document structures is unwarranted.
>
> The concept of a "merge" in YAML, where two sub-trees of data can be
> merged together (similar to a recursive Python dictionary update)
> is quite useful, though, and will be copied.  This does not alter the
> outcome that parsing a RSON file will result in a DAG, but does give
> more flexibility in the syntax that can be used to achieve a particular
> output DAG.
> === cut ===
>
> This paragraph assumes the reader is familiar with intricate details of the
> YAML grammar and semantics.  I bet most of your audience are completely lost
> here.
>
> === cut ===
> Enhanced example::
>
>    key1/key2a
>        key3a = Some random string
>        key3b = 42
>    key1/key2a
>        key3c
>            1
>            2
>            {}
>                key4a = anything
>                key4b = something else
>            []
>                a
>                b
>                c
>            3
>            4
>    key1/key2b = [1, 2, 3, 4]
>    key5 = ""
>       This is a multi-line string.  It is
>          dedented to the farthest left
>          column that is indented from
>          the line containing "".
>    key6 = [""]
>       This is an array of strings, one per line.
>       Each string is dedented appropriately.
> === cut ===
>
> Frankly, this is an example that only a mother could love.  I'd suggest you
> to add some real-world examples, make sure they look nice and put them to
> the introductory part of the document.  Examples is how the format will be
> evaluated by the readers, and yours don't stand a chance.
>
> Seriously, the only reason YAML enjoys its moderate popularity despite its
> overcomplicated grammar, chronic lack of manpower and deficient
> implementations is because it's so cute.
>
>
>
> Disclaimer: I'm the author of PyYAML and libyaml.
>
> Thanks,
> Kirill
>