[XML-SIG] qp_xml API (was: DOM API)
Greg Stein
gstein@lyra.org
Mon, 19 Apr 1999 02:28:29 -0700
Fredrik Lundh wrote:
> one could imagine that once we've settled on an API,
> there could be different implementations of the tree
> builder...
Seems reasonable.
> perhaps the "qp API" could be turned into a "standard
> python light-weight dom-like interface"? and to get that
> process started, maybe you could post an interface
> summary?
All right. Below is the summary. This is also the first opportunity for
public review, so I will welcome any suggestions for change.
qp_xml.error: a string for exceptions. [ed. this "should" become a
class]
qp_xml.Parser: the parser class. Typical use is: instantiate and call
the parse() method. The class is not thread-safe, but one-per-thread is
fine.
Parser.parse(input): input may be a string or an object supporting the
"read" method (e.g. a file or httplib.HTTPResponse (from my new httplib
module)). The input must represent a complete XML document. It will be
fully parsed and a lightweight representation will be returned. This
method may be called any number of times (for multiple documents). The
returned object is an instance of qp_xml._element.
_element.name: element ("tag") name
_element.ns: a Python string. The namespace URI this element's name
belongs to, or the empty string for "no namespace".
_element.lang: the xml:lang value that applies to this element's
attributes and content. It is inherited from the parent, pulled from
this element's attributes, or is None if no xml:lang is in scope.
_element.children: a Python list of the child elements, in order
_element.attrs: ### currently a list of objects representing attributes,
each object containing ns, name, value attributes. this will change to a
mapping of { (URI, name) : value }. ###
_element.first_cdata: a Python string which contains the element's
contents that are between the start tag and the first child element (if
present, otherwise the contents between the start/end tags). This will
be the empty string in both cases: <foo/> and <foo></foo>.
_element.following_cdata: a Python string containing the PARENT
element's content which follows this element's end tag (up to the next
child element of the parent, or the parent's end tag).
qp_xml.dump(f, element): uses f.write() to dump the element as XML.
Namespaces and xml:lang values will be inserted. Automatic selection of
namespace prefixes will be used as appropriate.
qp_xml.textof(element): return this element's contents
(non-recursively).
The *_cdata fields are reasonably "interesting" ... Here is a sample of
a few elements and how the cdata fields are filled in:
<elem1>
elem1.first_cdata contents
<elem2>
elem2.first_cdata contents
</elem2>
elem2.following_cdata contents
<elem3/>
elem3.following_cdata contents
</elem1>
The textof(elem1) function will return elem1.first_cdata +
elem2.following_cdata + elem3.following_cdata.
The *_cdata fields preserve whitespace.
Commentary:
Note that clients only need to import qp_xml, instantiate
qp_xml.Parser(), and call parse() (which returns an object). They only
deal with one object type in the return value (qp_xml._element), and
they directly access the fields in it. The object defines no methods.
Most clients will use .name, .attrs, and .children. qp_xml.textof(elem)
will return the element's text contents. Certain clients may use .ns to
test if the element is in the namespace they are looking for; a few
clients will use .lang to interpret attribute values and element
contents.
Cheers,
-g
--
Greg Stein, http://www.lyra.org/