On Tue, 2004-10-19 at 11:00, Guido van Rossum wrote:
- We're handling modest amounts of XML, all using home-grown DTDs and
with no specific requirements to interface to other apps or XML tools. I wrote a metaclass which lets me specify the DTD using Python syntax.
Sounds like my recent situation. I've done enough custom XML-ing lately that I've been thinking alone similar lines as you. Note that most of what I've written lately uses minidom, although I do have one particular application that uses sax. Both are powerful enough to do the job, but neither are that intuitive, IMO.
Again, my approach is slightly lower-level than previous proposals here but has the advantage of letting you be explicit about the mapping between Python and XML names, both for attributes and for subelements. The metaclass handles reading and writing. It supports elements containing text (is that CDATA? I never know)
I'm no XML guru, but I think they're different. In the one case you have something like:
<node>text for the node</node>
and in the other you have:
<node><![CDATA[cdata, er, um, data]]></node>
The differences being that the CDATA stuff shows up in a subnode of <node> and has less restriction on what data can be included within the delimiters.
My applications use both.
or sub-elements, but not both. For sub-elements, it supports cases where one element has any number of sub-elements of a certain type, which are then collected in a list, so you can refer to them using Python sequence indexing/slicing notation. It also supports elements that have zero or one sub-element of a certain type; absence is indicated by setting the corresponding attribute to None. I don't support namespaces, although I expect it would be easy enough to add them. I don't support unrecognized elements or attributes: while everything can be omitted (and defaults to None), unrecognized attributes or elements are always rejected. (I suppose that could be fixed too if desired.)
I have use cases for both behaviors. OT1H, I generally want to reject unknown elements or attributes, reject duplicate elements where my "DTD" doesn't allow them, etc. In at least one case I'm doing something that's probably evil, where sub-elements name email headers and the text inside provide the data for the header. I'm sure XML experts cringe at that and suggest I use something like:
or somesuch instead.
Here's an example:
That actually doesn't look too bad. Do you think you'll be able to release your stuff? I don't have anything generic enough to be useful yet, but I probably could release stuff if/when I do.
I'm undecided on whether I like the approach with lists of (name, type) tuples better than the approach with property factories like in the first example; the list approach allows me to order the attributes and sub-elements consistently upon rendering, but I'm not particularly keen on typing string quotes around Python identifiers.
The property factories are nice, and I have the same aversion to string quoting Python identifiers. I personally have not had a use case for retaining sub-element order.
I may play with my own implementation of your spec and see how far I can get. I definitely would like to see /something/ at a higher abstraction than minidom though.