Draft PEP on RSON configuration file format

Kirill Simonov xi at gamma.dn.ua
Mon Mar 1 21:02:34 EST 2010


Patrick Maupin wrote:
> All:
> 
> Finding .ini configuration files too limiting, JSON and XML to hard to
> manually edit, and YAML too complex to parse quickly, I have started
> work on a new configuration file parser.

I'd like to note that with the optional libyaml bindings, the PyYAML 
parser is pretty fast.

> I call the new format RSON (for "Readable Serial Object Notation"),
> and it is designed to be a superset of JSON.
> 
> I would love for it to be considered valuable enough to be a part of
> the standard library, but even if that does not come to pass, I would
> be very interested in feedback to help me polish the specification,
> and then possibly help for implementation and testing.
> 
> The documentation is in rst PEP form, at:
> 
> http://rson.googlecode.com/svn/trunk/doc/draftpep.txt

=== cut ===
Because YAML does allow for highly readable configuration files, it
is tempting to overlook its other flaws for the task.  But a fully
(or almost) compliant parser has to understand the whole YAML
specification, and this is apparently expensive.  Running the rst2pdf
testsuite, without sphinx or most of the other optional packages, in
"fast" mode (preloading all the modules, and then forking for every
test) generates 161 smallish PDF files, totaling around 2.5 MB.  On
one test system this process takes 22 seconds.  Disabling the _json C
scanner and reading the configuration files using the json pure Python
implementation adds about 0.3 seconds to the 22 seconds.  But using
pyyaml v. 3.09 instead of json adds 33 seconds to the 22 second process!
It might seem that this is an edge case, but it makes it unacceptable to
use YAML for this sort of testing, and taking 200 ms to read in 1000
lines of simple JSON will be unacceptable in many other application
domains as well.
=== cut ===

I'd question your testing methodology.  From your description, it looks 
like the _json speedup never was enabled.  Also PyYAML provides optional 
bindings to libyaml, which makes parsing and emitting yaml much faster. 
  In my tests, it parses a 10Mb file in 3 sec.

=== cut ===
RSON semantics are based on JSON.  Like JSON, an RSON document represents
either a single scalar object, or a DAG (Directed Acyclic Graph), which
may contain only a few simple data types.
=== cut ===

JSON doesn't represent a DAG, at least, not an arbitrary DAG since each 
node in the document has no more than one parent.  It would be more 
accurate to say that that it represents a tree-like structure.

=== cut ===
The YAML syntax for supporting back-references was considered and deemed
unsatisfactory. A human user who wants to put identical information in a
"ship to" and "bill to" address is much more likely to use cut and paste
than he is to understand and use backreferences, so the additional overhead
of supporting more complex document structures is unwarranted.

The concept of a "merge" in YAML, where two sub-trees of data can be
merged together (similar to a recursive Python dictionary update)
is quite useful, though, and will be copied.  This does not alter the
outcome that parsing a RSON file will result in a DAG, but does give
more flexibility in the syntax that can be used to achieve a particular
output DAG.
=== cut ===

This paragraph assumes the reader is familiar with intricate details of 
the YAML grammar and semantics.  I bet most of your audience are 
completely lost here.

=== cut ===
Enhanced example::

     key1/key2a
         key3a = Some random string
         key3b = 42
     key1/key2a
         key3c
             1
             2
             {}
                 key4a = anything
                 key4b = something else
             []
                 a
                 b
                 c
             3
             4
     key1/key2b = [1, 2, 3, 4]
     key5 = ""
        This is a multi-line string.  It is
           dedented to the farthest left
           column that is indented from
           the line containing "".
     key6 = [""]
        This is an array of strings, one per line.
        Each string is dedented appropriately.
=== cut ===

Frankly, this is an example that only a mother could love.  I'd suggest 
you to add some real-world examples, make sure they look nice and put 
them to the introductory part of the document.  Examples is how the 
format will be evaluated by the readers, and yours don't stand a chance.

Seriously, the only reason YAML enjoys its moderate popularity despite 
its overcomplicated grammar, chronic lack of manpower and deficient 
implementations is because it's so cute.



Disclaimer: I'm the author of PyYAML and libyaml.

Thanks,
Kirill



More information about the Python-list mailing list