[Python-Dev] ConfigParser shootout, preliminary entry

Fri Oct 22 00:28:34 CEST 2004

On Tue, 2004-10-19 at 11:00, Guido van Rossum wrote:

> 2. We're handling modest amounts of XML, all using home-grown DTDs and
> with no specific requirements to interface to other apps or XML tools.
> I wrote a metaclass which lets me specify the DTD using Python syntax.

Sounds like my recent situation.  I've done enough custom XML-ing lately
that I've been thinking alone similar lines as you.  Note that most of
what I've written lately uses minidom, although I do have one particular
application that uses sax.  Both are powerful enough to do the job, but
neither are that intuitive, IMO.

> Again, my approach is slightly lower-level than previous proposals
> here but has the advantage of letting you be explicit about the
> mapping between Python and XML names, both for attributes and for
> subelements. The metaclass handles reading and writing. It supports
> elements containing text (is that CDATA? I never know)

I'm no XML guru, but I think they're different.  In the one case you
have something like:

<node>text for the node</node>

and in the other you have:

<node><![CDATA[cdata, er, um, data]]></node>

The differences being that the CDATA stuff shows up in a subnode of
<node> and has less restriction on what data can be included within the
delimiters.

My applications use both.

>  or
> sub-elements, but not both. For sub-elements, it supports cases where
> one element has any number of sub-elements of a certain type, which
> are then collected in a list, so you can refer to them using Python
> sequence indexing/slicing notation. It also supports elements that
> have zero or one sub-element of a certain type; absence is indicated
> by setting the corresponding attribute to None. I don't support
> namespaces, although I expect it would be easy enough to add them. I
> don't support unrecognized elements or attributes: while everything
> can be omitted (and defaults to None), unrecognized attributes or
> elements are always rejected. (I suppose that could be fixed too if
> desired.) 

I have use cases for both behaviors.  OT1H, I generally want to reject
unknown elements or attributes, reject duplicate elements where my "DTD"
doesn't allow them, etc.  In at least one case I'm doing something
that's probably evil, where sub-elements name email headers and the text
inside provide the data for the header.  I'm sure XML experts cringe at
that and suggest I use something like:

<header name="to">value</header>

or somesuch instead.

> Here's an example:

[deleted]

That actually doesn't look too bad.  Do you think you'll be able to
release your stuff?  I don't have anything generic enough to be useful
yet, but I probably could release stuff if/when I do.

> I'm undecided on whether I like the approach with lists of (name,
> type) tuples better than the approach with property factories like in
> the first example; the list approach allows me to order the attributes
> and sub-elements consistently upon rendering, but I'm not particularly
> keen on typing string quotes around Python identifiers.

The property factories are nice, and I have the same aversion to string
quoting Python identifiers.  I personally have not had a use case for
retaining sub-element order.

I may play with my own implementation of your spec and see how far I can
get.  I definitely would like to see /something/ at a higher abstraction
than minidom though.

-Barry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: This is a digitally signed message part
Url : http://mail.python.org/pipermail/python-dev/attachments/20041021/ccdf018e/attachment.pgp