I would definitely like to have a YAML library--even one with restricted function--in the standard library. Anatoly's suggestion of 'yamlish' is fine. I have myself had to write a 'miniyaml.py' as part of one work project. We have some configuration files in bit of software I wrote/maintain, and these files are very much human edited, and their structure very well fits a subset of YAML. There are many things you can do in YAML that we do not need to do. Using JSON, or INI format, or XML, or other options supported in the Python standard library would have been possible, but in all cases the configuration files would have been far less readable than in the way I did it. Within the particular environment I'm supporting, it is possible but slightly cumbersome to install a 3rd party library like PyYaml. So I wrote <40 lines of code to support the subset we need, but keeping the API compatible. There's nothing special about my implementation, and it's certainly not ready for inclusion in STDLIB, but the fact I needed to do it is suggestive to me. Below is the docstring for my small module. Basically, ANY reasonable subset of YAML that was supported in the STDLIB would also be a drop-in replacement for my trivial code. Well, I suppose that if its API were different from that of PyYaml, some small change might be needed, but it would certainly support my simple use case.
This module provides an implementation for a small subset of YAML
Only the constructs needed for parsing an "invariants" file are supported here. The only supported API function of the PyYaml library is 'load_all()'. However, within that restriction, the result returned by 'miniyaml.load_all()'--if loading a string that this module can parse--is intended to be identical to that returned by 'yaml.load_all()'.
The intended use of this module is with an import like:
import miniyaml as yaml
In the presence of an actual PyYaml installation, this can simply be instead:
import yaml
The parsed subset used in invariants files looks like the below. If multiline comments (with line breaks) are need, the usual YAML construct is used. Each invariant block is a new YAML "document":
Invariant: some_python_construct(anton.leaf.parameter) is something_else Comment: This describes the invariant in more human readable terms --- Invariant: isMultiple(DT, some.type.of.interval) Comment: | The interval should really be a multiple of timestep because of equation: sigma = epsilon^2 + (3*foo)^5 And that's how it works. ---
On Jun 2, 2013, at 11:23 AM, anatoly techtonik wrote:
FWIW, I am +1 on for the ability to read YAML based configs Python without dependencies, but waiting for several years is hard.
Maybe try an alternative data driven development process as opposed to traditional PEP based all-or-nothing style to speed up the process? It is possible to make users happy incrementally and keep development fun without sacrificing too much on the Zen side. If open source is about scratching your own itches, then the most effective way to implement a spec would be to allow people add support for their own flavor without disrupting works of the others.
For some reason I think most people don't need full YAML speccy, especially if final implementation will be slow and heavy.
So instead of: import yaml I'd start with something more generic and primitive: from datatrans import yamlish
Where `datatrans` is data transformation framework taking care of usual text parsing (data partitioning), partition mapping (structure transformation) and conversion (binary to string etc.) trying to be as fast and lightweight as possible, bringing vast field for future optimizations on algorithmic level. `yamlish` is an implementation which is not vastly optimized (datatrans to yamlish is like RPython to PyPy) and can be easily extended to cover more YAML (hopefully). Therefore the name - it doesn't pretend to parse YAML - it parses some supported subset, which is improved over time by different parties (if datatrans is done right to provide readable (maintainable + extensible) code for implementation).
There is an exisiting package called `yamlish` on PyPI - I am not talking about it - it is PyYAML based, which is not an option for now as I see it. So I stole its name. Sorry. This PyPI package was used to parse TAP format, which is again, a subset. Subset..
It appears that YAML is good for humans for its subsets. It leaves an impression (maybe it's just an illusion) that development work for subset support can also be partitioned. If `datatrans` "done right" is possible, it will allow incremental addition of new YAML features as the need for them arises (new data examples are added). Or it can help to build parsers for YAML subsets that are intentionally limited to make them performance efficient.
Because `datatrans` is a package isolating parsing, mapping and conversion parts of the process to make it modular and extensible, it can serve as a reference point for various kinds of (scientific) papers including the ones that prove that such data transformation framework is impossible. As for `yamlish` submodule, the first important paper covering it will be a reference table matrix of supported features.
While it all sounds too complicated, driving development by data and real short term user needs (as opposed to designing everything upfront) will make the process more attractive. In data driven development, there are not many things that can break - you either continue parsing previous data or not. The output from the parsing process may change over time, but it may be controlled by configuring the last step of data transformation phase.
`Parsing AppEngine config file` or `reading package meta data` are good starting points. Once package meta data subset is parsed, it is done and won't break. The implementation for meta data parsing may mature in distutils package, for AppEngine in its SDK, and merged in either of those, sending patches for `datastrans` to stdlib. The question is only to design output format for the parse stage. I am not sure everything should be convertible into Python objects using the "best fit" technique.
I will be pretty comfortable if target format will not be native Python objects at all. More than that - I will even insist to avoid converting to native Python object from the start. The ideal output for the first version should be generic tree structure with defined names for YAML elements. The tree that can be represented as XML where these names are tags. The tree can be therefore traversed and selected using XPath/JQuery syntax.
It will take several years for implementation to mature and in the end there will be a plenty of backward compatibility matters with the API, formatting and serializing. So the first thing I'd do is [abandon serialization]. From the point of view of my self-proclaimed data transformation theory, the input and output formats are data. If output format is not human readable - as some linked Python data structures in memory - it wastes time and hinders development. Serializing Python is a problem of different level, which is an example of binary, abstract, memory-only output format - a lot of properties that you don't want to deal with while working with data.
To summarize: 1. full spec support is no goal 2. data driven (using real world examples/stories) 3. incremental (one example/story at a time) 4. research based (beautiful ideas vs ugly real world limitations)
5. maintainable (code is easy to read and understand the structure) 6. extensible (easy to find out the place to be modified) 7. core "generic tree" data structure as an intermediate format and "yaml tree" data structure as a final format from parsing process
P.S. I am willing to wok on this "data transformation theory" stuff and prototype implementation, because it is generally useful in many areas. But I need support. -- anatoly t. _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
-- mertz@ THIS MESSAGE WAS BROUGHT TO YOU BY: n o gnosis Postmodern Enterprises .cx IN A WORLD W/O WALLS, THERE WOULD BE NO GATES d o z e