[Python-ideas] PEP 426, YAML in the stdlib and implementation discovery

Philipp A. flying-sheep at web.de
Fri May 31 21:23:59 CEST 2013


2013/5/31 Andrew Barnert <abarnert at yahoo.com>

> But YAML is up to 1.2, and I believe most libraries (including PyYAML)
> only handle 1.1 so far. There are also known bugs in the 1.1 specification
> (e.g., "." is a valid float literal, but doesn't specify 0.0 or any other
> valid value), that each library has to work around. There are features of
> the standard, such as YAML<->XML bindings, that are still in early stages
> of design. Maybe a
> YAML-1.1-as-interpreted-by-the-majority-of-the-quasi-reference-implementations
> library doesn't need to evolve, but a YAML library does.
>

afaik YAML 1.2 exists to clarify those mentioned bugs, since they have all
been found and people needed a bugfree standard.

also could you mention a bug with a non-obvious solution? i don’t think any
yaml implementation is going to interpret “.” as something other than 0.0

Are you suggesting importing PyYAML (in modified form, and without the
> libyaml-binding "fast" implementation) into the stdlib, or building a new
> one? If the former, have you talked to Kirill Simonov? If the latter, are
> you proposing to build it, or just suggesting that it would be nice if
> somebody did?
>

i don’t know: would it aid my argument if i had asked him or written my
own? (i’ve done nothing of the two because unless Guido says “I can see
YAML in the stdlib” it would be pointless imho)

Do you mean adding a load_iter() akin to load_all() except that it yields
> one document at a time, or a SAX-like API instead of a simple load()?
>

no i meant that the lexer should be a generator (e.g. “[int(token) for
token in YAMLLexer(open('myfile.yml')).lex()]” and/or an API accepting
incomplete yaml chunks and emitting tokens, like “for token in
lexer.feed(stream.read())”)

but what you said is also necessary for the format: lexing from a long
stream of documents coming in through the network doesn’t make sense in
another way)

Your registration mechanism would mean they don't have to do this; they
> just import from the stdlib, and if lxml is present and registered, it
> would be loaded instead.
>

exactly

 There are a few examples of something similar, both in and out of the
> stdlib. For example:
>
> The dbm module basically works like this: you can import dbm.ndbm, or you
> can just import dbm to get the best available implementation. That isn't
> done by hooking the import, but rather by providing a handful of wrapper
> functions that forward to the chosen implementation. Is that reasonable for
> YAML, or are there too many top-level functions or too much module-level
> global state or something?
>

i think so: as i said, we’d need to define an API. since it’s “just” a
serialization language, i think we could go with not much more than

   - load(fileobj_or_filename, safe=True) #maybe better than a unsafe_blah
   for each loadlike function
   - load_iter(fileobj_or_filename, safe=True)
   - loads(fileobj_or_filename, safe=True)
   - loads_iter(fileobj_or_filename, safe=True)
   - dump()
   - dumps
   - YAMLLexer #with some methods and defined constructors
   - YAMLParser #accepting tokens from the lexer
   - YAMLTokens #one of the new, shiny enums


BeautifulSoup
>
> […]
>
> tulip
>

also nice ideas


> What all of these are missing is a way for an unknown third-party
> implementation to plug themselves in as the new best. Of course you can
> always monkeypatch it at runtime (dbm._names.insert(0, __name__)), but you
> want to do it at _install_ time, which is a different story.
>
> One further issue is that sometimes the system administrator (or end user)
> might want to affect the default choice for programs running on his
> machine. For example, lxml is built around libxml2. Mac OS X 10.6, some
> linux distros, etc. come with old or buggy versions of libxml2. You might
> want to install lxml anyway and make it the default for BeautifulSoup, but
> not for ElementTree, or vice-versa.
>
> Finally, what happens if you install two modules on your system which both
> register as implementations of the same module?
>

i think we can’t allows them to modify some syste-global list, since
everything would install itself as #1, so it would be pointless.
i don’t know how to select one, but we should expose a systemwide way to
configure the used one (like .pythonrc?), as well as a way to directly use
one from python (as said above). then it wouldn’t matter much, since the
admin is required to only install on, or configure the system to use the
preferred one.

the important things are imho to make the system discoverable and
transparent, exposing the found implementations and the used one as well as
we can.

Note that JSON is a strict subset of YAML 1.2, and not too far from a
> subset of 1.1. So, you could propose exclusive YAML, and make sticking
> within the JSON schema and syntax required for packages compatible with
> Python 3.3 and earlier, but optional for 3.4+ packages.
>

yeah. pretty nice. but i don’t think a stdlib yaml can land before 3.5.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130531/f669938d/attachment-0001.html>


More information about the Python-ideas mailing list