[XML-SIG] 4XSLT focus on docbook stylesheets (and fruit thereof)

Uche Ogbuji uche.ogbuji@fourthought.com
13 Jul 2002 23:13:36 -0600


--=-V0cjoJfkKo2UZu2/tsVQ
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

Thanks to Mike Brown's sleuthing, we realized that the things that was
killing 4XSLT's performance in docbook stylesheet was key preparation. 
He claimed it was 99% of the time, and I thought he was joking.  Well,
here's what happened as I worked on the problem:

The attached script is a little timing script I used for keeping track
of progress.  

At the outset, I got the following: 

time for 10 runs: 2624.12773895 seconds 
time per run 262.412773895 seconds 

After updating the XSLT matching logic stuff to take advantage of the
shortcuts available in the XPattern processor:

time for 10 runs: 837.315540075 seconds
time per run 83.7315540075 seconds

Which is noce, but I knew I could do better.  Turns out that was
understating it.  After the big move: making key computation lazy until
a key is requested in each doc's context, we go all the way down to:

time for 10 runs: 17.4807579517 seconds
time per run 1.74807579517 seconds


Guys, I think the docbook ghoul is put to bed.  :-)

I've also gone through the archives and fixed every bug that I thought
might affect docbook XSL processing.

To try it, use the 4Suite CVS HOWTO

http://4suite.org/docs/4SuiteCVS.xml

but just use the branch

EXPAT-1-95-4-branch

Which inclues a few docbook fixes (in addition to updating the expat
version)

Feedback this week would be especially valuable as we're planning on a
final 0.12.0 alpha release on Friday.


-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
Track chair, XML/Web Services One Boston: http://www.xmlconference.com/
The many heads of XML modeling - http://adtmag.com/article.asp?id=6393
Will XML live up to its promise? -
http://www-106.ibm.com/developerworks/xml/library/x-think11.html

--=-V0cjoJfkKo2UZu2/tsVQ
Content-Disposition: attachment; filename=db-keys.py
Content-Transfer-Encoding: quoted-printable
Content-Type: text/x-python; name=db-keys.py; charset=ISO-8859-1

#The keys from docbook/xslt, boiled down by Mike Brown.
#See http://lists.fourthought.com/pipermail/4suite-dev/2002-July/000293.htm=
l
XSLT =3D """<?xml version=3D'1.0'?>
<!DOCTYPE xsl:stylesheet [

<!ENTITY lowercase "'abcdefghijklmnopqrstuvwxyz'">
<!ENTITY uppercase "'ABCDEFGHIJKLMNOPQRSTUVWXYZ'">

<!ENTITY primary   'concat(primary/@sortas, primary[not(@sortas)])'>
<!ENTITY secondary 'concat(secondary/@sortas, secondary[not(@sortas)])'>
<!ENTITY tertiary  'concat(tertiary/@sortas, tertiary[not(@sortas)])'>

<!ENTITY section   '(ancestor-or-self::set
                     |ancestor-or-self::book
                     |ancestor-or-self::part
                     |ancestor-or-self::reference
                     |ancestor-or-self::partintro
                     |ancestor-or-self::chapter
                     |ancestor-or-self::appendix
                     |ancestor-or-self::preface
                     |ancestor-or-self::section
                     |ancestor-or-self::sect1
                     |ancestor-or-self::sect2
                     |ancestor-or-self::sect3
                     |ancestor-or-self::sect4
                     |ancestor-or-self::sect5
                     |ancestor-or-self::refsect1
                     |ancestor-or-self::refsect2
                     |ancestor-or-self::refsect3
                     |ancestor-or-self::simplesect
                     |ancestor-or-self::bibliography
                     |ancestor-or-self::glossary
                     |ancestor-or-self::index)[last()]'>

<!ENTITY section.id 'generate-id(&section;)'>
<!ENTITY sep '" "'>
]>
<xsl:stylesheet xmlns:xsl=3D"http://www.w3.org/1999/XSL/Transform"
                xmlns:f=3D"http://xmlns.4suite.org/ext"
                extension-element-prefixes=3D"f"
                version=3D'1.0'>

<xsl:key name=3D"letter"
         match=3D"indexterm"
         use=3D"translate(substring(&primary;, 1,
1),&lowercase;,&uppercase;)"/>

<xsl:key name=3D"primary"
         match=3D"indexterm"
         use=3D"&primary;"/>

<xsl:key name=3D"secondary"
         match=3D"indexterm"
         use=3D"concat(&primary;, &sep;, &secondary;)"/>

<xsl:key name=3D"tertiary"
         match=3D"indexterm"
         use=3D"concat(&primary;, &sep;, &secondary;, &sep;, &tertiary;)"/>

<xsl:key name=3D"primary-section"
         match=3D"indexterm[not(secondary) and not(see)]"
         use=3D"concat(&primary;, &sep;, &section.id;)"/>

<xsl:key name=3D"secondary-section"
         match=3D"indexterm[not(tertiary) and not(see)]"
         use=3D"concat(&primary;, &sep;, &secondary;, &sep;, &section.id;)"=
/>

<xsl:key name=3D"tertiary-section"
         match=3D"indexterm[not(see)]"
         use=3D"concat(&primary;, &sep;, &secondary;, &sep;, &tertiary;,
&sep;, &section.id;)"/>

<xsl:key name=3D"see-also"
         match=3D"indexterm[seealso]"
         use=3D"concat(&primary;, &sep;, &secondary;, &sep;, &tertiary;,
&sep;, seealso)"/>

<xsl:key name=3D"see"
         match=3D"indexterm[see]"
         use=3D"concat(&primary;, &sep;, &secondary;, &sep;, &tertiary;,
&sep;, see)"/>

<xsl:key name=3D"sections" match=3D"*[@id]" use=3D"@id"/>

<xsl:param name=3D"l10n.xml"
select=3D"document('l10n.xml')"/>

<xsl:template match=3D"/">
  %s
</xsl:template>
</xsl:stylesheet>"""


DOC =3D """
<!-- based on http://www.freebsd.org/doc/en_US.ISO_8859-1/books/fdp-primer/=
examples.html -->

    <book>
      <bookinfo>
        <title>An Example Book</title>
       =20
        <author>
          <firstname>Your first name</firstname>
          <surname>Your surname</surname>
          <affiliation>
            <address><email>foo@example.com</email></address>
          </affiliation>
        </author>
   =20
        <copyright>
          <year>2000</year>
          <holder>Copyright string here</holder>
        </copyright>
   =20
        <abstract>
          <para>If your book has an abstract then it should go here.</para>
        </abstract>
      </bookinfo>
   =20
      <preface>
        <title>Preface</title>
   =20
        <para>Your book may have a preface, in which case it should be plac=
ed
          here.</para>
      </preface>
         =20
      <chapter>
        <title>My first chapter</title>
   =20
        <para>This is the first chapter in my book.</para>
   =20
        <sect1>
          <title>My first section</title>
   =20
          <para>This is the first section in my book.</para>
        </sect1>
       =20
        %s
      </chapter>
    </book>
"""

MIDDLE =3D """        <section>
          <title>My first section</title>
   =20
          <para>This is the first section in my book.</para>
        </section>
"""*50

import time
from Ft.Xml.Xslt import Processor
from Ft.Xml import InputSource
start =3D time.time()
from Ft.Lib import Uri
PATH =3D Uri.OsPathToUri('.', attemptAbsolute=3D1)

N =3D 1
for i in range(N):
    print ".",
    processor =3D Processor.Processor()
    transform =3D InputSource.DefaultFactory.fromString(XSLT%"<done/>", PAT=
H+"db.xslt")
    processor.appendStylesheet(transform)
    source =3D InputSource.DefaultFactory.fromString(DOC%MIDDLE, PATH+"doc.=
xml")
    print ".",
    processor.run(source)

processor =3D Processor.Processor()
transform =3D InputSource.DefaultFactory.fromString(XSLT%"<xsl:for-each sel=
ect=3D"$l10n.xml"><f:dump-keys/></xsl:for-each>", "http://spam.com/identity=
.xslt")
processor.appendStylesheet(transform)
source =3D InputSource.DefaultFactory.fromString(DOC%MIDDLE, "http://spam.c=
om/doc1.xml")
output =3D processor.run(source)

end =3D time.time()
print
print "time for", N, "runs:", end-start, "seconds"
print "time per run", (end-start)/N, "seconds"
print "dump of keys:"
print output
print


--=-V0cjoJfkKo2UZu2/tsVQ--