[XML-SIG] XBEL DTD as a meta-dtd

Tue, 15 Sep 1998 17:34:58 +0200

Hi,

This post became rather long (my DTD is at the bottom).

I did some experimenting with Geir's xmlarch.py and it works nicely
(once you update to the newest sax stuff).

I made this effort because I thought XBEL could be used in a Website
management tool that checks external links and reports on them
(something like linbot). Such a processor could store information
about the links in the 'info' elements and the id's could refer back
to the original XML document.

In order for XBEL to function as a meta-DTD I needed to loosen some
restrictions and make a few changes to the XBEL DTD. With these
changes it is possible to derive XBEL from many XML documents just by
specifying how the mapping has to take place. This architectural
processing is standardized (annex A.3 of ISO/IEC 10744:1997) so I
could use other architectural engines to do the same (for example XAF
by David Megginson). No coding of specialized XML processors
needed. The XBEL is like a virtual document automatically derived from
the XML source.

For some more examples and explanations look at the documentation for
XAF (http://www.megginson.com/XAF).

Thanks, Geir for making this possible in Python.

At the end I included the xbel dtd as I use it now. Maybe we could
reach a consensus. The DTD is looser now which makes processing it a
little more difficult. Processors that output XBEL are not affected
much since they could always output a more restricted form of XBEL but
it would be nice if a processor that reads XBEL could cope with the
looser XBEL DTD.

Here's an example of two simplified XML fragments:

  <tei>
    <div1><head>Chapter 1</head>
      <p>This is <xref href="a">A</xref>.</p>
      <p>This is <xref href="b">B</xref>.</p>
      <div2><head>Chapter 2</head>
        <p>This is <xref href="c">C</xref>.</p>
      </div2>
    </div1>
  </tei>

[This is not real TEI since it lacks an easy way to refer to an url]

  <book><title>My Book</title>
    <chapter><title>Chapter 1</title>
      <para>This is <ulink url="a">A</ulink>.</para>
      <para>This is <ulink url="b">B</ulink>.</para>
    </chapter>
    <chapter><title>Chapter 2</title>
      <para>This is <ulink url="c">C</ulink>.</para>
    </chapter>
  </book>

Obviously there are some structural differences. Also, in the first a
paragraph is called 'p' in the other 'para', in the first a chapter
is called 'div1' and in the other 'chapter'.

With architectural forms you can extract a structured list of url's
from both of these without creating a separate processor for
each. Just specify how the derivation should work and process the
document with an architectural forms processor (like xmlarch.py).

To show how that works I used the 'book' example:

Here's the complete document:

<?xml version='1.0'?>
<?IS10744:arch name="xbel"
               auto="nArcAuto"
               renamer-att="xbel-atts"
               dtd-system-id="xbel.dtd"
               suppressor-att="suppress"
               ignore-data-att="ignore"
?>
<!DOCTYPE book SYSTEM "db3xml10.dtd" [

  <!ATTLIST title   xbel NMTOKEN "title">
  <!ATTLIST chapter xbel NMTOKEN "folder"
                    xbel-atts NMTOKENS ""
  >
  <!ATTLIST ulink   xbel NMTOKEN "url"
                    xbel-atts NMTOKENS "url href baz #DEFAULT"
                    ignore NMTOKEN "nArcIgnD"
  > 
  <!ATTLIST para    suppress NMTOKEN "sArcNone">
]>

<book><title>My Book</title>

  <chapter id="ch1">
    <title>Chapter 1</title> 

    <para>This is <ulink id="A101" url="a"><acronym>
      <emphasis>A</emphasis></acronym></ulink></para>

    <para>This is <ulink id="A123" 
      url="b">B</ulink></para>

  </chapter>

  <chapter id="ch2">
    <title>Chapter 2</title>

    <paxa>This is <ulink id="A23" url="c">C</ulink></para>

  </chapter>
</book>

Feeding this to xmlarch.py results in the following architectural (or
virtual) document:

<xbel><title>My Book</title>

  <folder id="ch1">
    <title>Chapter 1</title> 

    <url href="a" id="A101">A</url>

    <url href="b" id="A123">B</url>

  </folder>

  <folder id="ch2">
    <title>Chapter 2</title>

    <url href="c" id="A23">C</url>

  </folder>
</xbel>

As you can see xmlarch.py derived the xbel document from the book
document. The chapter element's are changed to folder's. The ulink's
are changed to url's and every url attribute is changed to a href
attribute. It also stripped the elements inside the first ulink.

If we want to use XBEL to work as a meta-dtd for doing these kinds of
things some changes to the DTD are in order. Architectural forms can
do many things but they cannot completely reorder the original
document so the XBEL DTD (meta DTD) and the XML DTD used (client DTD)
need to have some structural similarities.

=========== my current XBEL DTD ================

<!ELEMENT xbel     (title?,info?, (bookmark|folder|url|alias|separator)*)>
<!ATTLIST xbel
            version CDATA   #IMPLIED
>

<!--=================== Info block ================================-->

<!ELEMENT info    (meta)*>

<!ELEMENT meta    EMPTY>
<!ATTLIST meta
            name    CDATA #REQUIRED
            content CDATA #REQUIRED
>

<!--=================== Folder ====================================-->

<!ELEMENT folder   (title?,info?,desc?,(bookmark|folder|separator|alias|url)*)>
<!ATTLIST folder
            id       ID       #IMPLIED
            added    CDATA    #IMPLIED
            folded   (yes|no) 'yes'   
>

<!--=================== URL ======================================-->

<!ELEMENT url        (#PCDATA)>
<!ATTLIST url
            id       ID       #IMPLIED
            href     CDATA    #REQUIRED
            added    CDATA    #IMPLIED
            visited  CDATA    #IMPLIED
            modified CDATA    #IMPLIED
            response CDATA    #IMPLIED
            checked  CDATA    #IMPLIED
>

<!--=================== Bookmark ==================================-->
<!-- a wrapper around an url when it has to contain extra info
     like a description and info

-->
<!ELEMENT bookmark (info?,url,desc?)>

<!ELEMENT desc       (#PCDATA)>

<!--=================== Separator =================================-->

<!ELEMENT separator EMPTY>

<!--=================== Alias =====================================-->

<!ELEMENT alias EMPTY>
<!ATTLIST alias
            ref       IDREF    #REQUIRED
>