[XML-SIG] XML to Object translation

Ken MacLeod ken@bitsko.slc.ut.us
23 Nov 2000 10:18:00 -0600

Rich Salz <rsalz@caveosystems.com> writes:

> > The idea is basically that when you have an XML document of a
> > particular type (say RSS) you make a mapping file that describes
> > how to create an object structure from the file. For RSS this
> > might look as shown below.
> I wonder if the syntax of that mapping file could be XSLT?

XSLT would work very well as long as it can handle the full
transformation.  In the XSLT stylesheet, the "output schema" would
follow a regular pattern, not unlike SOAP or WDDX, that can then
easily be converted to objects.

The one drawback to using XSLT is that many conversions to objects
require a little bit more code to get things to work well, so XSLT +
some embedded scripting would be even better.

I have a Perl module, ToObjects[1], that uses a technique very similar
to Paul Prescod's EasySAX[2], where it has an XML pattern to match
(like an XPath), and what you want to do with it as an argument.  In
EasySAX, the goal is to make parsing easier:

Paul Prescod writes in [2]:
> Here are the basic ideas:
>  * handlers pass DOM nodes. (a la "Marrying" and Pyxie)
>  * all parent nodes are available (not really that expensive usually)
>  * peer and child nodes are not typically available
>  * callbacks are labelled with *full XPaths* (a la XSLT)
>          * abuses "docstrings" (a la "spark")
> def start_spam( self, textNode ):
>     "figure/title/text()"
>     print "Figure title:"+`textNode`
>     print "tagname:"+textNode.parentNode.tagName
>  * you can ask for an event to be handed to you as a tree
> def start_applets(self,elementNode):
>     "object/applet as tree"
>     for node in elementNode.childNodes:
>         print node
>  * if you do, then child nodes are available
>  * the tree disappears after the end event
>  * handles namespaces

It's the "as tree" part that ToObjects is similar to, and ToObjects
goal is building an object tree from XML.

ToObjects has specifiers for creating objects, for storing or
appending the content of elements in a field, for defaulting values,
for "keeping" a DOM tree contents, for taking element content "as
string", etc.  You can also provide a procedure to be called when a
pattern matches.  The example translation in [1], minus the Perl
syntax, looks something like this:

  schema       -holder
  table        -make Schema.Table
  name         -field name -as-string
  summary      -field summary -as-string
  description  -field description -as-dom
  column       -make Schema.Column -push-field columns
  unique       -field unique -value 1
  non-null     -field non_null -value 1
  default      -field default -as-string

Note, of course, that this can be represented as XML, as Python doc
strings (a la EasySAX), or as XSLT with script extensions.  I'm just
suggesting the type of operations that would be common in converting
arbitrary XML into objects.

I used a similar technique to map DocBook, LinuxDoc, and TEI to
"generic document objects" before formatting the GDOs, and it worked
very well.

Although not on the immediate ToDo list, this type of processing is
important to the project I'm currently working on, Orchard[3], where
converting XML into nodes is going to be a common process.

  -- Ken

[1] <http://bitsko.slc.ut.us/libxml-perl/XML%3A%3APatAct%3A%3AToObjects.html>
[2] <http://www.python.org/pipermail/xml-sig/2000-March/003655.html>
[3] <http://beauvoir.phil.unc.edu/groves>