[XML-SIG] New Reader Architecture
Mon, 06 Nov 2000 13:46:16 -0700
> > We have rewitten most of the code used for creating text from DOMs.
> > I've cc'ed xml-sig because the check-ins of 4DOM I'll be making
> > today reflect these changes.
> Very interesting. Are you following the DOM Level 3 discussions on
> load-and-save interfaces? [I couldn't access the draft right now, so
> I can't check whether it is related to your work]
Not yet. In the first draft, load and save was not covered at all. I haven't
perused the second draft, but at any rate it will be somewhat closer to CR
before we move DOM L3. We were burned in terms of wasted effort by moving to
the draft DOM L2 namespaces and having them change quite a bit.
> > Using one of the new reader classes is also simple. You create an
> > instance passing in to the constructor any parameters relevant to the
> > state of that class.
> While support for customization is a good thing, I think many users
> won't need it, or might get confused by it. So I'd prefer to have some
> guidelines what the "good for most uses" way of getting a DOM is.
OK. I'll try to get some such doc in before release.
> > Once you have the reader instance, you use the fromStream or fromUri
> > method to create each DOM. The equivalents to the other common utility
> > reader functions (say fromString or fromFile) have been eliminated for
> > simplicity since it is trivial to turn text or a filename into a
> > stream.
> Can you please bring the fromString interface back? In interactive
> mode, it is a pain to type StringIO.StringIO.
> Also, what is the complication that makes urllib not work for fromUri?
> In the Python 2 SAX2 interfaces, you can pass a string to parse, and
> it will then consider that as a system identifier. In turn, it will
> pass it to urllib, which will open either a local file or the URL.
Ah, but not all URIs are URLs. What if you have a URN resolution handler?
This is something that will be especially relevant with 4Suite Server, which
provides URN/UUIDs for XML documents in the repositories, and also provides a
relevant URI handler which can easily be plugged into XPath, XSLT, RDF,
> > [Note that the Domlette readers also have an argument to fromStream,
> > stripElements, for specifying elements from which white-space is to be
> > stripped while building the DOM. This is merely to support some
> > internal XSLT optimizations until a better way can be found. Using
> > these arguments is deprecated and they may be removed from the method
> > signatures in any future 4Suite release.]
> Isn't a validating parser supposed to indicate which elements can have
> their whitespace stripped?
Not directly, but of course one can use the ignorableWhitespace call-back if
you're using SAX.
However, the reader support for stripping is an entirely different matter
entirely. XSLT allows you to specify elements to be stripped from source
documents. Originally, 4XSLT would create the DOM normally, and then strip
the relevant WS nodes, but this was horribly inefficient. We sped things up
several times by merely stripping whitespace as we built the DOM. This is why
we have the interface, and it is also why it is not recommended for regular
use: it's pretty much a hack (but a very important hack) for XSLT performance.
> > Python 1.x users can break circular dependencies by calling the
> > releaseNode method on the reader that was used to create the DOM:
> > reader.releaseNode(xml_doc)
> What kind of circularity does that break? The one in the tree? Does
> that mean I have to keep the reader until I release the tree?
Yes and maybe. You don't have to keep the reader around if you're sure what
type of DOM you have. However, if you try to call cDomlette's ReleaseNode on
a pDomlette node, it will break, and vice versa. That's why it's also on the
instance as a convenience.
Uche Ogbuji Principal Consultant
email@example.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python