[XML-SIG] Replacing a Java tool-chain with 4Suite?
Uche Ogbuji
uche.ogbuji@fourthought.com
Wed, 15 Jan 2003 20:16:06 -0700
> Well, I eventually tracked down some of the problems. (With Uche's help).
>
> There were missing /'s in tags of a set of files (which were included
> about 4 times via entity references). This was giving "unbalanced tag"
> errors when running the 4xslt transform, but not reporting the
> (included) file-name correctly (file was reported as being the root
> file, so the line-number was just weird, pointing in the middle of the
> DTD). Simply writing a Python script that loaded it manually gave me the
> names of the files that were causing the problems. If I'm correct,
> these missing ETAGs were being ignored by Saxon because they were
> reducible markup (in SGML parlance, don't know what it's called with
> XML), while the Python parsers required well-formed markup.
>
> When presented with the (opaque) error reports from 4xslt, I'd enabled
> validation in the hope that the error report would kick out more
> information, and in doing so tickled the DocBook-validation problem, and
> thus got side-tracked. (xmlproc_val does report errors on processing the
> docbook dtd, I've sent a message to the docbook list to that effect, as
> requested).
>
> Regarding speed, I must be mis-using something, or there may be a
> problem with our xsl script. I let the command:
>
> 4xslt -o test.xml doc\manual\manual.xml doc\xsl\merge.xsl
>
> run for close to three hours before killing it. It was running in
> constant memory (about 45MB), with 100% CPU utilisation for the entire
> run. Simply loading and pretty-printing the Python-specific template
> files (which are smaller than the original API docs, but not 100s of
> times smaller) only takes seconds (well, maybe a minute), so I'm
> thinking there's something getting into an infinite loop during the
> processing.
>
> Given the simplicity of the transformation in this case (just a merge by
> section name!) I may write the darned thing in Python to save time (I
> started out around 22 hours ago saying to myself "oh, guess I should
> regenerate the manual before I start working on the web-site" :) ).
Can you post merge.xsl? I'm guessing it simply operates on Docbook
section/title elements and thus could work with any Docbook source file? DO
you use recursive templates in merge.xslt? It's really easy to get into an
infinite loop with XSLT recursive templates.
Based on the simplicity of the transform you're doing, it certainly looks like
a really easy job using the sorts of genrrator/iterator tools I outline in
http://www.xml.com/pub/a/2003/01/08/py-xml.html
I like XSLT better than most, but it's not for every task.
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Python Generators + DOM - http://www.xml.com/pub/a/2003/01/08/py-xml.html
4Suite Repository Features - https://www6.software.ibm.com/reg/devworks/dw-x4su
ite5-i/
XML class warfare - http://www.adtmag.com/article.asp?id=6965
MusicBrainz metadata - http://www-106.ibm.com/developerworks/xml/library/x-thi
nk14.html