[XML-SIG] XBEL DTD as a meta-dtd
Marc van Grootel
Tue, 15 Sep 1998 17:34:58 +0200
This post became rather long (my DTD is at the bottom).
I did some experimenting with Geir's xmlarch.py and it works nicely
(once you update to the newest sax stuff).
I made this effort because I thought XBEL could be used in a Website
management tool that checks external links and reports on them
(something like linbot). Such a processor could store information
about the links in the 'info' elements and the id's could refer back
to the original XML document.
In order for XBEL to function as a meta-DTD I needed to loosen some
restrictions and make a few changes to the XBEL DTD. With these
changes it is possible to derive XBEL from many XML documents just by
specifying how the mapping has to take place. This architectural
processing is standardized (annex A.3 of ISO/IEC 10744:1997) so I
could use other architectural engines to do the same (for example XAF
by David Megginson). No coding of specialized XML processors
needed. The XBEL is like a virtual document automatically derived from
the XML source.
For some more examples and explanations look at the documentation for
Thanks, Geir for making this possible in Python.
At the end I included the xbel dtd as I use it now. Maybe we could
reach a consensus. The DTD is looser now which makes processing it a
little more difficult. Processors that output XBEL are not affected
much since they could always output a more restricted form of XBEL but
it would be nice if a processor that reads XBEL could cope with the
looser XBEL DTD.
Here's an example of two simplified XML fragments:
<p>This is <xref href="a">A</xref>.</p>
<p>This is <xref href="b">B</xref>.</p>
<p>This is <xref href="c">C</xref>.</p>
[This is not real TEI since it lacks an easy way to refer to an url]
<para>This is <ulink url="a">A</ulink>.</para>
<para>This is <ulink url="b">B</ulink>.</para>
<para>This is <ulink url="c">C</ulink>.</para>
Obviously there are some structural differences. Also, in the first a
paragraph is called 'p' in the other 'para', in the first a chapter
is called 'div1' and in the other 'chapter'.
With architectural forms you can extract a structured list of url's
from both of these without creating a separate processor for
each. Just specify how the derivation should work and process the
document with an architectural forms processor (like xmlarch.py).
To show how that works I used the 'book' example:
Here's the complete document:
<!DOCTYPE book SYSTEM "db3xml10.dtd" [
<!ATTLIST title xbel NMTOKEN "title">
<!ATTLIST chapter xbel NMTOKEN "folder"
xbel-atts NMTOKENS ""
<!ATTLIST ulink xbel NMTOKEN "url"
xbel-atts NMTOKENS "url href baz #DEFAULT"
ignore NMTOKEN "nArcIgnD"
<!ATTLIST para suppress NMTOKEN "sArcNone">
<para>This is <ulink id="A101" url="a"><acronym>
<para>This is <ulink id="A123"
<paxa>This is <ulink id="A23" url="c">C</ulink></para>
Feeding this to xmlarch.py results in the following architectural (or
<url href="a" id="A101">A</url>
<url href="b" id="A123">B</url>
<url href="c" id="A23">C</url>
As you can see xmlarch.py derived the xbel document from the book
document. The chapter element's are changed to folder's. The ulink's
are changed to url's and every url attribute is changed to a href
attribute. It also stripped the elements inside the first ulink.
If we want to use XBEL to work as a meta-dtd for doing these kinds of
things some changes to the DTD are in order. Architectural forms can
do many things but they cannot completely reorder the original
document so the XBEL DTD (meta DTD) and the XML DTD used (client DTD)
need to have some structural similarities.
=========== my current XBEL DTD ================
<!ELEMENT xbel (title?,info?, (bookmark|folder|url|alias|separator)*)>
version CDATA #IMPLIED
<!--=================== Info block ================================-->
<!ELEMENT info (meta)*>
<!ELEMENT meta EMPTY>
name CDATA #REQUIRED
content CDATA #REQUIRED
<!--=================== Folder ====================================-->
<!ELEMENT folder (title?,info?,desc?,(bookmark|folder|separator|alias|url)*)>
id ID #IMPLIED
added CDATA #IMPLIED
folded (yes|no) 'yes'
<!--=================== URL ======================================-->
<!ELEMENT url (#PCDATA)>
id ID #IMPLIED
href CDATA #REQUIRED
added CDATA #IMPLIED
visited CDATA #IMPLIED
modified CDATA #IMPLIED
response CDATA #IMPLIED
checked CDATA #IMPLIED
<!--=================== Bookmark ==================================-->
<!-- a wrapper around an url when it has to contain extra info
like a description and info
<!ELEMENT bookmark (info?,url,desc?)>
<!ELEMENT desc (#PCDATA)>
<!--=================== Separator =================================-->
<!ELEMENT separator EMPTY>
<!--=================== Alias =====================================-->
<!ELEMENT alias EMPTY>
ref IDREF #REQUIRED