[XML-SIG] SQL -> XML ?

Sun, 14 Feb 1999 16:23:54 -0600

[Paul Prescod:]

> As is often the case on mailing lists, I think
> that we have lost some context. Steve has
> described a paradigm for *representing* tables
> as trees. That paradigm must still be
> *implemented*. Michael's original question was
> about implementation strategies. I don't know of
> an implementation strategy that overcomes the
> basic impedence mismatch between relational data
> and tree data.

I was assuming that the metaphor, "impedance
mismatch," refers to the phenomenon of signal
degradation the interface between two electronic
components, one of which is supplying a signal to
the other, but the electrical characteristics of
the signal produced by the sending are
inappropriate to the electrical characteristics of
the receiver.  In electronic design, transformers
(or trickier, more active things) are used to
overcome impedance mismatches, so that the signal
is not degraded.  Perhaps I misunderstood the
question (and perhaps I still misunderstand it),
but I thought the essential question was: how can
transformations be done that allow these two very
different kinds of information (two kinds of
signals, if you will) to be losslessly converted
into one another.  In other words: What does the
transformer (or other trickier thing) do, and,
more to the point, what is the science on which it
is based?

I was suggesting that one "science" on which such
a "transformer" could be based is the grove
paradigm, which was developed precisely for the
purpose of allowing things that are naturally and
inevitably trees, such as interchangeable XML
documents, to accurately reflect the information
content -- the "signal" -- of information whose
inherent structure, in its most useful form, is
completely arbitrary (and could therefore be an
RDBMS, to mention just one example).

The implementation strategy is to apply the grove
paradigm.

This involves expressing the model to which the
RDBS data most usefully and naturally conforms as
a property set -- in effect, the abstract API to
the semantics of the RDBS.  If the schema of the
RDBMS and the property set of its semantics
resemble one another, no one should be surprised;
they are ideally alternative ways of looking at
the same information set.  However, because RDBMSs
have inherent strict limitations on their modeling
capabilities (tables having exactly two
dimensions, for example), the design of such
databases doesn't always reflect the natural
structure of the information that the data
represent, nor does the information always present
itself to applications, directly from the
database, in a fashion which is the most
convenient or natural.  So, a special interface is
usually built that provides a more intuitive or
more summarized interface, for the convenience of
applications.  But, in RDBMS-land, the natural
underlying semantic structure of the information
-- the signal -- is not formalized all the way up
to this "convenience API".  The "convenience API"
actually is, however, the best available
expression of the raison d'etre of the information
and the anticipated applications of it.  If the
convenience API were rationalized, formalized,
codified, and interchanged as a model of the
information set, rather than as a set of procedure
calls, it would improve the reliability with which
the information set could be accurately
interchanged, and it would widen the scope of
applications that could make use of the
information.  The "property sets" formalism of the
grove paradigm is designed to be an
internationally standard modality of codifying and
interchanging just such abstract "convenience"
APIs.  (By the way, it is no big deal to make an
RDBS appear to be a grove, if only you have first
codified the real information set as a property
set.  If your information really is
two-dimensional tables, or even n-dimensional
tables, it's not a problem.)

Assuming you have made the RDBMS look like a
grove, the transformation of the information in
the RDBMS into interchangeable form as an XML
document is more or less straightforward.  You
have to create a DTD that can represent the
information set nonredundantly, and
transformations between the grove and
interchangeable forms of the information set
should be clearly specified in an "Architecture
Definition Document" that includes the DTD and the
property set.  Finally, software that performs the
transformation -- an "architecture engine" which,
like an impedance-matching device, may or may not
be bi-directional -- is written.  (The GroveMinder
system provides tools that facilitate this
process, as well as permitting multiple
architecture engines to be used concurrently with
the same multiple-architecture-inheriting
interchangeable information.  But this business of
GroveMinder and this other business of multiple
architecture inheritance are other juicy topics
for other occasions.)

I do feel that the grove paradigm is an elegant
way to meet all the requirements of reliable
industry-wide information interchange in an open,
multivendor, arbitrary-application,
arbitrary-information-set,
arbitrary-storage-facility environment -- in other
words, in the real world, with all its actual
warts.  It's not an accident that the grove
paradigm is the subject of the only international
standard for defining what an application sees
when processing information that must be
interchanged according to some arbitrary
interchange representation, such as XML (but not
limited to XML, and, unlike the DOM, fully
capable, from first principles, of incorporating
DTD-specific semantics and supporting addressing
based on arbitrary properties as well as
structural phenomena).

The grove paradigm allows the accuracy and
reliability of information interchange to be
whatever people are prepared to pay for; it can be
used in quick and dirty ways, but it outshines all
others when information must be handled with
respect for its integrity.  One thing you can't do
with the grove paradigm is to conceal the ways in
which you chose to be quick and dirty.  I think
that's good for information interchange.  It's
good news for information owners, creators,
maintainers, and users.  It's probably not good
news for traditional software business models.
(Who cares?)

-Steve

--
Steven R. Newcomb, President, TechnoTeacher, Inc.
srn@techno.com  http://www.techno.com  ftp.techno.com

voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137)
fax    +1 972 994 0087 (at ISOGEN: +1 214 953 3152)

3615 Tanner Lane
Richardson, Texas 75082-2618 USA