On Mon, Jun 3, 2013 at 2:53 AM, Andrew Barnert firstname.lastname@example.org wrote:
From: anatoly techtonik email@example.com Sent: Sunday, June 2, 2013 11:23 AM
FWIW, I am +1 on for the ability to read YAML based configs Python without dependencies, but waiting for several years is hard.
With all due respect, I don't think you've read even a one-sentence description of YAML, so your entire post is nonsense.
I'll try to clarify my post, so that it will be clear for you. Please, ask if something is unclear.
You're right. I am not reading specifications prior to using things. What do I personally need from YAML? These are examples of files I use daily:
http://tmuxp.readthedocs.org/en/latest/examples.html http://code.google.com/p/rietveld/source/browse/app.yaml https://github.com/agschwender/pilbox/blob/master/provisioning/playbook.yml http://pastebin.com/RG7g260k (OpenXcom save format)
The first sentence of the abstract says, "YAML… is a…data serialization language designed around the common native data types of agile programming languages." So, your idea that we shouldn't use it for serialization, and shouldn't map it to native Python data types, is ridiculous.
I don't care really about the abstract. I am a complaining user - not a smart guy, who wrote the spec. So my thinking is the following:
1. Neither of examples above is a persistence data format of serialized native computer language data types. These are just nested mappings and lists. Strictly two dimensional tree data structure, even for openXcom one. It is YAML, or as I said - subset of YAML, and that's why I deliberately called this format "yamlish".
2. Regardless of any desire to use this proposal as an opportunity to see the full YAML 1.2 spec implemented in Python stdlib, I am going to resist. I need work with *safe data format*, which is "human friendly". And I put *safe format* over *serialization format*.
You specifically suggest mapping YAML to XML so we can treat it as a structured document. From the "Relation to XML" section: "YAML is primarily a data serialization language. XML was… designed to support structured documentation."
Where? Oh, do you mean this one:
"The ideal output for the first version should be generic tree structure with defined names for YAML elements. The tree that can be represented as XML where these names are tags. "
It is not about "structured document", it is about "structured data format".
"tree that can be represented as XML" is not "XML tree". XML here is just an example of structured nested format that everybody is aware of. I want to say that this "tree structure" should be plain, and 1:1 mapping to XML is necessary and sufficient requirement.
You suggest that we shouldn't build all of YAML, just some bare-minimum subset that's good enough to get started. JSON is already _more_ than a bare-minimum subset of YAML, so we're already done.
I didn't know that JSON is not compatible with YAML. Still I am not sure I understand how your argument of "JSON in not YAML" makes it "done" with minimal implementation of YAML.
Module name - "yamlish" - defines its purpose as something that my poor language skills can verbalize as "provide support for parsing and writing files in formats, that are subsets of YAML used to store generic user editable, not Python specific declarative data, such as configurations, save files, settings etc.". Because I am not a CS major, I can't describe exactly how to define common things between examples I provided, how these examples are different from usual programming language objects serialized into YAML. I feel that these examples are "yamlish" and I am pretty much appreciate if somebody can come up with proper *definition* of characteristics of the simple data formats (which are still YAML) that give this feeling.
Such definition will greatly help to keep it moving in the right direction.
But you'd also like some data-driven way to extend this. YAML has already designed exactly that. Once you have the core schema, you can add new types, and the syntax for those types is data-driven (although the semantics are really only defined in hand-wavy English and probably require code to implement, but I'm not sure how you expect your proposal to be any different, unless you're proposing something like XML Schema). So, either the necessary subset of YAML you want is the entire spec, or you want to do an equal amount of work building something just as complex but not actually YAML.
No, it is not data-driven support for extension in "yamlish" format. It is data-driven process of writing parser for "yamlish" - you get one example, parse it, get output, write test, get another, parse it, get output, run previous test. "yamlish" format is only for common, human understandable data files.
Perhaps expanding on the idea of "yamlish" format with development process and with details of my "own data transformation theory" was not a good idea, but it was the only chance to find a motivation to write down the stuff. =) Sorry for the overload, and let me clarify things a little. I proposed process for extending support of "yamlish" parser to parse more backward-compatible "yamlish" data formats. There is no mechanism to support conflicting formats, or formats that change the output for existing stuff. That's it. There is no additional API for full YAML, so no complexity involved with maintenance and support of extra features or full YAML speccy.
`datatrans` framework I was speaking about is possible implementation of the lib to transform 2D structures between different formats. You know, data transformation process is all the same at some level. On the level above I even can say that everything we do in CS is just data transformation. It is not related to "yamlish" format definition. The only thing that is important that "datatrans" enables many input and many outputs of formats that can be represented in 2D annotated (or generic) tree. It is not related to "yamlish".
The idea of building a useful subset of YAML isn't a bad one. But the way to do that is to go through the features of YAML that JSON doesn't have, and decide which ones you want. For example, YAML with the core schema, but no aliases, no plain strings, and no explicit tags is basically JSON with indented block structure, raw strings, and useful synonyms for key constants (so you can write True instead of true). You could even carefully import a few useful definitions from the type library as long as they're unambiguous (e.g., timestamp). That gives you most of the advantages of YAML that don't bring any safety risks, and its output would be interpretable as full YAML, and it might be a little easier to implement than the full spec. But that has very little to do with your proposal. In particular, leaving out the data-driven features of YAML is what makes it safe and simple.
Now I feel that we basically thinking about the same things - simplicity and safety.
I didn't read the spec, so I don't know what things are in core YAML schema, so here you know much better than I what needs to be filtered out. My thought was using examples to see what should be filtered out, because iterating over spec will bring many more "useful" features that people with forward thinking might want, but which may be harmful for keeping this small and simple.
I really like YAML brevity compared to JSON and other structured data formats (tmuxp example page is a good one). Support for indented data format is also natural for indented language. But it is hard to make format right and not to spoil it with overengineering.
I believe that this "data-driven features of YAML" is the point of confusion. I recall that YAML spec provided some declarative mechanism for extensions. It is not it. My data-driven approach is just "don't design anything upfront, use existing widely used data examples as a spec of data that needs to be parsed". And yes - I don't need this YAML extensibility feature, which I too believe makes YAML unsafe.
I need YAML as a format of indented data in a text file. Nothing more. YAML without "extra processing" that leads to potential hacks and execution of unwanted code. I just want to make sure that data format is safe. Currently, Python stdlib lacks a safe serialization format - docs are bleeding red of warnings without specifying any alternatives. I like to call it "yamlish", because if it is named YAML, people will demand dynamism, OOPy "constructor/destructor" tricks, and sooner or later the module users will be pwnd, like it happened with other serialization modules before. Therefore I don't want "serialization as a feature", but I don't mind against "serialization as a side effect" if it is compatible with good intuitive API AND improves the speed without sacrificing _clarity_ and safety.
_clarity_ here is the understanding that "there is no way that 'yamlsih' format can be unsafe" at all times.
Meanwhile, I think what you actually want is XSLT processors to convert YAML to and from XML. Fortunately, the YAML community is already working on that at http://www.yaml.org/xml. Then you don't need any new Python code at all; just convert your YAML to XML and use whichever XML library (in the stdlib or not), and you're done.
XSLT, that declarative turing complete language. I fed up it. Complexity and performance ruin the beautiful theory. I think that turing-completeness is a trap - solving its gestalts gives a good feeling when you learn it, but it has nothing to do with the real world problems. XSLT processors hog memory AND slow at the same time. XSLT debug is impossible, because process is obscure. I guess that it is also easily exploitable to DoS. XSLT? Not anymore, thanks.
XML has only one advantage over all other formats - auto-discoverable validation schemas. That's why it is still so popular.
FWIW. Right now Python doesn't have any safe native data for structured data - only linked objects and references. Some time ago I tried to introduce solution to handling structured data by proposing 2D (two dimensional) terminology with a generic tree as base type. But the post became too complicated, lacking pictures, and I was unable to support the communication.
I don't want this idea to find a rest in mailing list archives, so if you know how to write such minimal (and safe) parser (and fast) in Python (and maintainable), please tell me.
If additional parser language is inevitable, maybe somebody knows of a comparison site similar that http://todomvc.com/ does for MV* frameworks.