2013/5/31 Andrew Barnert <abarnert@yahoo.com>
But YAML is up to 1.2, and I believe most libraries (including PyYAML) only handle 1.1 so far. There are also known bugs in the 1.1 specification (e.g., "." is a valid float literal, but doesn't specify 0.0 or any other valid value), that each library has to work around. There are features of the standard, such as YAML<->XML bindings, that are still in early stages of design. Maybe a YAML-1.1-as-interpreted-by-the-majority-of-the-quasi-reference-implementations library doesn't need to evolve, but a YAML library does.

afaik YAML 1.2 exists to clarify those mentioned bugs, since they have all been found and people needed a bugfree standard.

also could you mention a bug with a non-obvious solution? i don’t think any yaml implementation is going to interpret “.” as something other than 0.0

Are you suggesting importing PyYAML (in modified form, and without the libyaml-binding "fast" implementation) into the stdlib, or building a new one? If the former, have you talked to Kirill Simonov? If the latter, are you proposing to build it, or just suggesting that it would be nice if somebody did?

i don’t know: would it aid my argument if i had asked him or written my own? (i’ve done nothing of the two because unless Guido says “I can see YAML in the stdlib” it would be pointless imho)

Do you mean adding a load_iter() akin to load_all() except that it yields one document at a time, or a SAX-like API instead of a simple load()? 

no i meant that the lexer should be a generator (e.g. “[int(token) for token in YAMLLexer(open('myfile.yml')).lex()]” and/or an API accepting incomplete yaml chunks and emitting tokens, like “for token in lexer.feed(stream.read())”)

but what you said is also necessary for the format: lexing from a long stream of documents coming in through the network doesn’t make sense in another way)

Your registration mechanism would mean they don't have to do this; they just import from the stdlib, and if lxml is present and registered, it would be loaded instead.


There are a few examples of something similar, both in and out of the stdlib. For example:

The dbm module basically works like this: you can import dbm.ndbm, or you can just import dbm to get the best available implementation. That isn't done by hooking the import, but rather by providing a handful of wrapper functions that forward to the chosen implementation. Is that reasonable for YAML, or are there too many top-level functions or too much module-level global state or something?

i think so: as i said, we’d need to define an API. since it’s “just” a serialization language, i think we could go with not much more than
  • load(fileobj_or_filename, safe=True) #maybe better than a unsafe_blah for each loadlike function
  • load_iter(fileobj_or_filename, safe=True)
  • loads(fileobj_or_filename, safe=True)
  • loads_iter(fileobj_or_filename, safe=True)
  • dump()
  • dumps
  • YAMLLexer #with some methods and defined constructors
  • YAMLParser #accepting tokens from the lexer
  • YAMLTokens #one of the new, shiny enums




also nice ideas
What all of these are missing is a way for an unknown third-party implementation to plug themselves in as the new best. Of course you can always monkeypatch it at runtime (dbm._names.insert(0, __name__)), but you want to do it at _install_ time, which is a different story.

One further issue is that sometimes the system administrator (or end user) might want to affect the default choice for programs running on his machine. For example, lxml is built around libxml2. Mac OS X 10.6, some linux distros, etc. come with old or buggy versions of libxml2. You might want to install lxml anyway and make it the default for BeautifulSoup, but not for ElementTree, or vice-versa.

Finally, what happens if you install two modules on your system which both register as implementations of the same module?

i think we can’t allows them to modify some syste-global list, since everything would install itself as #1, so it would be pointless.
i don’t know how to select one, but we should expose a systemwide way to configure the used one (like .pythonrc?), as well as a way to directly use one from python (as said above). then it wouldn’t matter much, since the admin is required to only install on, or configure the system to use the preferred one.

the important things are imho to make the system discoverable and transparent, exposing the found implementations and the used one as well as we can.

Note that JSON is a strict subset of YAML 1.2, and not too far from a subset of 1.1. So, you could propose exclusive YAML, and make sticking within the JSON schema and syntax required for packages compatible with Python 3.3 and earlier, but optional for 3.4+ packages.

yeah. pretty nice. but i don’t think a stdlib yaml can land before 3.5.