Draft PEP on RSON configuration file format
Patrick Maupin
pmaupin at gmail.com
Mon Mar 1 21:19:34 EST 2010
Kirill:
Thank you for your constructive criticism. This is the gem that made
it worthwhile to post my document. I think all of your points are
spot-on, and I will be fixing the documentation.
I can well believe that the C implementation of YAML is much faster
than the Python one, but I am aiming for something that will be
reasonably quick in pure Python. I will double-check the JSON C test
results, but something I probably did not make clear is that the 22
seconds is not spent parsing -- that is for the entire test, which
involves reading restructured text and generating some 160 separate
PDF files.
Best regards,
Pat
On Mon, Mar 1, 2010 at 8:02 PM, Kirill Simonov <xi at gamma.dn.ua> wrote:
> Patrick Maupin wrote:
>>
>> All:
>>
>> Finding .ini configuration files too limiting, JSON and XML to hard to
>> manually edit, and YAML too complex to parse quickly, I have started
>> work on a new configuration file parser.
>
> I'd like to note that with the optional libyaml bindings, the PyYAML parser
> is pretty fast.
>
>> I call the new format RSON (for "Readable Serial Object Notation"),
>> and it is designed to be a superset of JSON.
>>
>> I would love for it to be considered valuable enough to be a part of
>> the standard library, but even if that does not come to pass, I would
>> be very interested in feedback to help me polish the specification,
>> and then possibly help for implementation and testing.
>>
>> The documentation is in rst PEP form, at:
>>
>> http://rson.googlecode.com/svn/trunk/doc/draftpep.txt
>
> === cut ===
> Because YAML does allow for highly readable configuration files, it
> is tempting to overlook its other flaws for the task. But a fully
> (or almost) compliant parser has to understand the whole YAML
> specification, and this is apparently expensive. Running the rst2pdf
> testsuite, without sphinx or most of the other optional packages, in
> "fast" mode (preloading all the modules, and then forking for every
> test) generates 161 smallish PDF files, totaling around 2.5 MB. On
> one test system this process takes 22 seconds. Disabling the _json C
> scanner and reading the configuration files using the json pure Python
> implementation adds about 0.3 seconds to the 22 seconds. But using
> pyyaml v. 3.09 instead of json adds 33 seconds to the 22 second process!
> It might seem that this is an edge case, but it makes it unacceptable to
> use YAML for this sort of testing, and taking 200 ms to read in 1000
> lines of simple JSON will be unacceptable in many other application
> domains as well.
> === cut ===
>
> I'd question your testing methodology. From your description, it looks like
> the _json speedup never was enabled. Also PyYAML provides optional bindings
> to libyaml, which makes parsing and emitting yaml much faster. In my tests,
> it parses a 10Mb file in 3 sec.
>
> === cut ===
> RSON semantics are based on JSON. Like JSON, an RSON document represents
> either a single scalar object, or a DAG (Directed Acyclic Graph), which
> may contain only a few simple data types.
> === cut ===
>
> JSON doesn't represent a DAG, at least, not an arbitrary DAG since each node
> in the document has no more than one parent. It would be more accurate to
> say that that it represents a tree-like structure.
>
> === cut ===
> The YAML syntax for supporting back-references was considered and deemed
> unsatisfactory. A human user who wants to put identical information in a
> "ship to" and "bill to" address is much more likely to use cut and paste
> than he is to understand and use backreferences, so the additional overhead
> of supporting more complex document structures is unwarranted.
>
> The concept of a "merge" in YAML, where two sub-trees of data can be
> merged together (similar to a recursive Python dictionary update)
> is quite useful, though, and will be copied. This does not alter the
> outcome that parsing a RSON file will result in a DAG, but does give
> more flexibility in the syntax that can be used to achieve a particular
> output DAG.
> === cut ===
>
> This paragraph assumes the reader is familiar with intricate details of the
> YAML grammar and semantics. I bet most of your audience are completely lost
> here.
>
> === cut ===
> Enhanced example::
>
> key1/key2a
> key3a = Some random string
> key3b = 42
> key1/key2a
> key3c
> 1
> 2
> {}
> key4a = anything
> key4b = something else
> []
> a
> b
> c
> 3
> 4
> key1/key2b = [1, 2, 3, 4]
> key5 = ""
> This is a multi-line string. It is
> dedented to the farthest left
> column that is indented from
> the line containing "".
> key6 = [""]
> This is an array of strings, one per line.
> Each string is dedented appropriately.
> === cut ===
>
> Frankly, this is an example that only a mother could love. I'd suggest you
> to add some real-world examples, make sure they look nice and put them to
> the introductory part of the document. Examples is how the format will be
> evaluated by the readers, and yours don't stand a chance.
>
> Seriously, the only reason YAML enjoys its moderate popularity despite its
> overcomplicated grammar, chronic lack of manpower and deficient
> implementations is because it's so cute.
>
>
>
> Disclaimer: I'm the author of PyYAML and libyaml.
>
> Thanks,
> Kirill
>
More information about the Python-list
mailing list