[XML-SIG] 4XSLT focus on docbook stylesheets (and fruit thereof)
Uche Ogbuji
uche.ogbuji@fourthought.com
13 Jul 2002 23:13:36 -0600
--=-V0cjoJfkKo2UZu2/tsVQ
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Thanks to Mike Brown's sleuthing, we realized that the things that was
killing 4XSLT's performance in docbook stylesheet was key preparation.
He claimed it was 99% of the time, and I thought he was joking. Well,
here's what happened as I worked on the problem:
The attached script is a little timing script I used for keeping track
of progress.
At the outset, I got the following:
time for 10 runs: 2624.12773895 seconds
time per run 262.412773895 seconds
After updating the XSLT matching logic stuff to take advantage of the
shortcuts available in the XPattern processor:
time for 10 runs: 837.315540075 seconds
time per run 83.7315540075 seconds
Which is noce, but I knew I could do better. Turns out that was
understating it. After the big move: making key computation lazy until
a key is requested in each doc's context, we go all the way down to:
time for 10 runs: 17.4807579517 seconds
time per run 1.74807579517 seconds
Guys, I think the docbook ghoul is put to bed. :-)
I've also gone through the archives and fixed every bug that I thought
might affect docbook XSL processing.
To try it, use the 4Suite CVS HOWTO
http://4suite.org/docs/4SuiteCVS.xml
but just use the branch
EXPAT-1-95-4-branch
Which inclues a few docbook fixes (in addition to updating the expat
version)
Feedback this week would be especially valuable as we're planning on a
final 0.12.0 alpha release on Friday.
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Track chair, XML/Web Services One Boston: http://www.xmlconference.com/
The many heads of XML modeling - http://adtmag.com/article.asp?id=6393
Will XML live up to its promise? -
http://www-106.ibm.com/developerworks/xml/library/x-think11.html
--=-V0cjoJfkKo2UZu2/tsVQ
Content-Disposition: attachment; filename=db-keys.py
Content-Transfer-Encoding: quoted-printable
Content-Type: text/x-python; name=db-keys.py; charset=ISO-8859-1
#The keys from docbook/xslt, boiled down by Mike Brown.
#See http://lists.fourthought.com/pipermail/4suite-dev/2002-July/000293.htm=
l
XSLT =3D """<?xml version=3D'1.0'?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY lowercase "'abcdefghijklmnopqrstuvwxyz'">
<!ENTITY uppercase "'ABCDEFGHIJKLMNOPQRSTUVWXYZ'">
<!ENTITY primary 'concat(primary/@sortas, primary[not(@sortas)])'>
<!ENTITY secondary 'concat(secondary/@sortas, secondary[not(@sortas)])'>
<!ENTITY tertiary 'concat(tertiary/@sortas, tertiary[not(@sortas)])'>
<!ENTITY section '(ancestor-or-self::set
|ancestor-or-self::book
|ancestor-or-self::part
|ancestor-or-self::reference
|ancestor-or-self::partintro
|ancestor-or-self::chapter
|ancestor-or-self::appendix
|ancestor-or-self::preface
|ancestor-or-self::section
|ancestor-or-self::sect1
|ancestor-or-self::sect2
|ancestor-or-self::sect3
|ancestor-or-self::sect4
|ancestor-or-self::sect5
|ancestor-or-self::refsect1
|ancestor-or-self::refsect2
|ancestor-or-self::refsect3
|ancestor-or-self::simplesect
|ancestor-or-self::bibliography
|ancestor-or-self::glossary
|ancestor-or-self::index)[last()]'>
<!ENTITY section.id 'generate-id(§ion;)'>
<!ENTITY sep '" "'>
]>
<xsl:stylesheet xmlns:xsl=3D"http://www.w3.org/1999/XSL/Transform"
xmlns:f=3D"http://xmlns.4suite.org/ext"
extension-element-prefixes=3D"f"
version=3D'1.0'>
<xsl:key name=3D"letter"
match=3D"indexterm"
use=3D"translate(substring(&primary;, 1,
1),&lowercase;,&uppercase;)"/>
<xsl:key name=3D"primary"
match=3D"indexterm"
use=3D"&primary;"/>
<xsl:key name=3D"secondary"
match=3D"indexterm"
use=3D"concat(&primary;, &sep;, &secondary;)"/>
<xsl:key name=3D"tertiary"
match=3D"indexterm"
use=3D"concat(&primary;, &sep;, &secondary;, &sep;, &tertiary;)"/>
<xsl:key name=3D"primary-section"
match=3D"indexterm[not(secondary) and not(see)]"
use=3D"concat(&primary;, &sep;, §ion.id;)"/>
<xsl:key name=3D"secondary-section"
match=3D"indexterm[not(tertiary) and not(see)]"
use=3D"concat(&primary;, &sep;, &secondary;, &sep;, §ion.id;)"=
/>
<xsl:key name=3D"tertiary-section"
match=3D"indexterm[not(see)]"
use=3D"concat(&primary;, &sep;, &secondary;, &sep;, &tertiary;,
&sep;, §ion.id;)"/>
<xsl:key name=3D"see-also"
match=3D"indexterm[seealso]"
use=3D"concat(&primary;, &sep;, &secondary;, &sep;, &tertiary;,
&sep;, seealso)"/>
<xsl:key name=3D"see"
match=3D"indexterm[see]"
use=3D"concat(&primary;, &sep;, &secondary;, &sep;, &tertiary;,
&sep;, see)"/>
<xsl:key name=3D"sections" match=3D"*[@id]" use=3D"@id"/>
<xsl:param name=3D"l10n.xml"
select=3D"document('l10n.xml')"/>
<xsl:template match=3D"/">
%s
</xsl:template>
</xsl:stylesheet>"""
DOC =3D """
<!-- based on http://www.freebsd.org/doc/en_US.ISO_8859-1/books/fdp-primer/=
examples.html -->
<book>
<bookinfo>
<title>An Example Book</title>
=20
<author>
<firstname>Your first name</firstname>
<surname>Your surname</surname>
<affiliation>
<address><email>foo@example.com</email></address>
</affiliation>
</author>
=20
<copyright>
<year>2000</year>
<holder>Copyright string here</holder>
</copyright>
=20
<abstract>
<para>If your book has an abstract then it should go here.</para>
</abstract>
</bookinfo>
=20
<preface>
<title>Preface</title>
=20
<para>Your book may have a preface, in which case it should be plac=
ed
here.</para>
</preface>
=20
<chapter>
<title>My first chapter</title>
=20
<para>This is the first chapter in my book.</para>
=20
<sect1>
<title>My first section</title>
=20
<para>This is the first section in my book.</para>
</sect1>
=20
%s
</chapter>
</book>
"""
MIDDLE =3D """ <section>
<title>My first section</title>
=20
<para>This is the first section in my book.</para>
</section>
"""*50
import time
from Ft.Xml.Xslt import Processor
from Ft.Xml import InputSource
start =3D time.time()
from Ft.Lib import Uri
PATH =3D Uri.OsPathToUri('.', attemptAbsolute=3D1)
N =3D 1
for i in range(N):
print ".",
processor =3D Processor.Processor()
transform =3D InputSource.DefaultFactory.fromString(XSLT%"<done/>", PAT=
H+"db.xslt")
processor.appendStylesheet(transform)
source =3D InputSource.DefaultFactory.fromString(DOC%MIDDLE, PATH+"doc.=
xml")
print ".",
processor.run(source)
processor =3D Processor.Processor()
transform =3D InputSource.DefaultFactory.fromString(XSLT%"<xsl:for-each sel=
ect=3D"$l10n.xml"><f:dump-keys/></xsl:for-each>", "http://spam.com/identity=
.xslt")
processor.appendStylesheet(transform)
source =3D InputSource.DefaultFactory.fromString(DOC%MIDDLE, "http://spam.c=
om/doc1.xml")
output =3D processor.run(source)
end =3D time.time()
print
print "time for", N, "runs:", end-start, "seconds"
print "time per run", (end-start)/N, "seconds"
print "dump of keys:"
print output
print
--=-V0cjoJfkKo2UZu2/tsVQ--