Hi, reading PEP 426, I made a connection to a (IMHO) longstanding issue: YAML not being in the stdlib.

I’m no big fan of JSON, because it’s so strict and comparatively verbose compared with YAML. I just think YAML is more pythonic, and a better choice for any kind of human-written data format.

So i devised 3 ideas:

  1. YAML in the stdlib
    The stdlib shouldn’t get more C code; that’s what I’ve gathered.
    So let’s put a pure-python implementation of YAML into the stdlib.
    Let’s also strictly define the API and make it secure-by-naming™. What i mean is let’s use the safe load function that doesn’t instantiate user-defined classes (in PyYAML called “safe_load”) as default load function “load”, and call the unsafe one by a longer, explicit name (e.g. “unsafe_load” or “extended_load” or something)
    Let’s base the parser on generators, since generators are cool, easy to debug, and allow us to emit and test the token stream (other than e.g. the HTML parser we have)
  2. Implementation discovery
    People want fast parsing. That’s incompatible with a pure python implementation.
    So let’s define (or use, if there is one I’m not aware of) a discovery mechanism that allows implementations of certain APIs to register themselves as such.
    Let “import yaml” use this mechanism to import a compatible 3rd party implementation in preference to the stdlib one
    Let’s define a property of the implementation that tells the user which implementation he’s using, and a way to select a specific implementation (Although that’s probably easily done by just not doing “import yaml”, but “import std_yaml” or “import pyyaml2”)
  3. Allow YAML to be used besides JSON as metadata like in PEP 426. (so including either pymeta.yaml or pymeta.json makes a valid package)
    I don’t propose that we exclusively use YAML, but only because I think that PEP 426 shouldn’t be hindered from being implemented ASAP by waiting for a new std-library to be ready.

What do you think?

Is there a reason for not including a YAML lib that i didn’t cover?

Is there a reason JSON is used other than YAML not being in the stdlib?