[XML-SIG] serializing with xslt with SAX
Paul Tremblay
phthenry at earthlink.net
Sun Feb 15 03:20:01 EST 2004
On Sat, Feb 14, 2004 at 07:07:43PM -0700, Mike Brown wrote:
>
> In the Python world, SAX is not necessarily the most efficient. For example,
> 4Suite uses Expat to do parsing of serialized XML, and it builds Domlette
> documents from Expat's native callbacks (which are somewhat SAX-like, but
> different). It's more efficient to supply a Domlette to the processor than it
> is to supply an unparsed document or even Expat callbacks. The processor does
> support SAX, and Domlette (as Result Tree Fragment) output, though, so we
> could perhaps write a SAX-to-Expat layer for use in conjunction with the SAX
> XSLT output writer, or we could write an Expat XSLT output writer, but we're
> better off just using our Result Tree Fragment writer, which generates
> Domlette nodes that can be fed directly to the next transformation instance.
I had suspected that the advice, from java gura Michael Kay,
was biased towards java.
>
> We don't yet have a good chaining API or recipe for 4Suite in general, and in
> researching our capabilities in order to answer this question, Jeremy & I
> found some bugs that have since been fixed in CVS. The code sample below is an
> example that should work with a current CVS snapshot, and is pretty fast,
> although Jeremy points out that Processor re-use is not thoroughly tested and
> the overhead of creating a new Processor instance is minimal in comparison to
> going through all the things that happen when the Processor.reset() is called.
So if creating a new Processor is minimal, I can use this code below?
from Ft.Xml import InputSource
from Ft.Xml.Xslt.Processor import Processor
# first run
document = InputSource.DefaultFactory.fromUri(xmlfile)
stylesheet = InputSource.DefaultFactory.fromUri(xsltfile)
processor = Processor()
processor.appendStylesheet(stylesheet)
result = processor.run(document)
# second run. And so on.
document = InputSource.DefaultFactory.fromString(result)
stylesheet = InputSource.DefaultFactory.fromUri(xsltfile)
processor = Processor()
processor.appendStylesheet(stylesheet)
result2 = processor.run(document)
I'll have to download a CVS snapshot to test the code below. But I think
I need something more standard, since the scripts I'm working with will
be published.
I'm coming to the realization that xslt isn't absolutely standard. Trax
was supposed to allow a universal interface. But as of now, it only
works with two processors: saxon and xalan.
That means if you write an application to process XML with xslt
stylesheets, you will be either using Java or perl/pyton (etc) with C++
libraries.
By the way, do you know how read and write from a string using libsxlt?
I coudn't find anything on the web on that.
Okay, I have a lot of question on this example.
> from Ft.Xml import InputSource, Domlette
> from Ft.Xml.Xslt import Processor, RtfWriter
I actually don't know what Rtf is, though I keep hearing this term.
>
> class Test:
> # we're going to try to reuse the processor
> p = Processor.Processor()
>
> def run(self, src_isrc, chain):
> i = 0
> if not chain:
> return ''
> for (sty, uri) in chain:
> sty_isrc = InputSource.DefaultFactory.fromString(sty, uri)
> self.p.appendStylesheet(sty_isrc)
> # not on last stylesheet in chain?
> if i < len(chain) - 1:
> # use an RtfWriter
> w = RtfWriter.RtfWriter(None, 'urn:temp.xml')
You are setting up an RtfWriter--what is that?
Why the "urn" prefix?
> # not on first stylesheet in chain?
> if i:
> # use last RtfWriter's buffer as source doc
> self.p.execute(result, src_isrc, writer=w)
But here you use p.execute.
> else:
> # use original source doc
> self.p.run(src_isrc, writer=w)
Okay, so the first time you use p.run. Why is that?
> # save result to use as source doc next time
> result = w.getResult()
Save to a string
> # last stylesheet in chain
> else:
> if w:
Why wouldn't the Rtf writer be defined?
> result = self.p.execute(result, src_isrc)
> else:
> result = self.p.run(src_isrc)
> self.p.reset()
> i += 1
> return result
>
>
> xml_isrc = InputSource.DefaultFactory.fromString(src_xml, 'urn:hamlet.xml')
>
> # four 6-letter rotations + a 2-letter rotation and uppercasing
> # should result in a full rotation and uppercasing...
> # expected output is an uppercase version of the Hamlet quotation
> #
> chain = [(xslt1, 'urn:lc-rot6.xsl'),
> (xslt1, 'urn:lc-rot6.xsl'),
> (xslt1, 'urn:lc-rot6.xsl'),
> (xslt1, 'urn:lc-rot6.xsl'),
> (xslt2, 'urn:lc-rot2-uc.xsl'),
> ]
Sorry to be dense here, but what does each tupple represent? Is the
first item a name or a path? Is the second item some type of uri address?
>
> t = Test()
> print t.run(xml_isrc, chain)
Thanks for all your help.
Paul
--
************************
*Paul Tremblay *
*phthenry at earthlink.net*
************************
More information about the XML-SIG
mailing list