[XML-SIG] qp_xml API (was: DOM API)

Greg Stein gstein@lyra.org
Mon, 19 Apr 1999 02:28:29 -0700


Fredrik Lundh wrote:
> one could imagine that once we've settled on an API,
> there could be different implementations of the tree
> builder...

Seems reasonable.

> perhaps the "qp API" could be turned into a "standard
> python light-weight dom-like interface"?  and to get that
> process started, maybe you could post an interface
> summary?

All right. Below is the summary. This is also the first opportunity for
public review, so I will welcome any suggestions for change.

qp_xml.error: a string for exceptions. [ed. this "should" become a
class]

qp_xml.Parser: the parser class. Typical use is: instantiate and call
the parse() method. The class is not thread-safe, but one-per-thread is
fine.

Parser.parse(input): input may be a string or an object supporting the
"read" method (e.g. a file or httplib.HTTPResponse (from my new httplib
module)). The input must represent a complete XML document. It will be
fully parsed and a lightweight representation will be returned. This
method may be called any number of times (for multiple documents). The
returned object is an instance of qp_xml._element.

_element.name: element ("tag") name

_element.ns: a Python string. The namespace URI this element's name
belongs to, or the empty string for "no namespace".

_element.lang: the xml:lang value that applies to this element's
attributes and content. It is inherited from the parent, pulled from
this element's attributes, or is None if no xml:lang is in scope.

_element.children: a Python list of the child elements, in order

_element.attrs: ### currently a list of objects representing attributes,
each object containing ns, name, value attributes. this will change to a
mapping of { (URI, name) : value }. ###

_element.first_cdata: a Python string which contains the element's
contents that are between the start tag and the first child element (if
present, otherwise the contents between the start/end tags). This will
be the empty string in both cases: <foo/> and <foo></foo>.

_element.following_cdata: a Python string containing the PARENT
element's content which follows this element's end tag (up to the next
child element of the parent, or the parent's end tag).

qp_xml.dump(f, element): uses f.write() to dump the element as XML.
Namespaces and xml:lang values will be inserted. Automatic selection of
namespace prefixes will be used as appropriate.

qp_xml.textof(element): return this element's contents
(non-recursively).


The *_cdata fields are reasonably "interesting" ... Here is a sample of
a few elements and how the cdata fields are filled in:

<elem1>
  elem1.first_cdata contents
  <elem2>
    elem2.first_cdata contents
  </elem2>
  elem2.following_cdata contents
  <elem3/>
  elem3.following_cdata contents
</elem1>

The textof(elem1) function will return elem1.first_cdata +
elem2.following_cdata + elem3.following_cdata.

The *_cdata fields preserve whitespace.


Commentary:

Note that clients only need to import qp_xml, instantiate
qp_xml.Parser(), and call parse() (which returns an object). They only
deal with one object type in the return value (qp_xml._element), and
they directly access the fields in it. The object defines no methods.

Most clients will use .name, .attrs, and .children. qp_xml.textof(elem)
will return the element's text contents. Certain clients may use .ns to
test if the element is in the namespace they are looking for; a few
clients will use .lang to interpret attribute values and element
contents.


Cheers,
-g

--
Greg Stein, http://www.lyra.org/