[XML-SIG] serializing with xslt with SAX

Mike Brown mike at skew.org
Sat Feb 14 21:07:43 EST 2004


Paul Tremblay wrote:
> I need to chain several stylesheets together in order to transforma
> document. I was told that I should use SAX to perform this
> serialization. However, the only documenation I came across was doint so
> with java.
> 
> I have used Python SAX in the past, but only to process XML documents. I
> dont' know how to use it in conjunction with an xslt processor. I have
> the Oreilly book *Python & XML,* but this didn't help me.

Stylesheet chaining is feeding the output of one transformation as the input
to the next. The simplest way to do that is of course to just do
transformations normally, serializing the result as XML at the end of each
transformation, and then letting this be reparsed by the next transformation
process.

To make this more efficient, it is preferable to feed the result documents to
the next transform in the manner that would be most efficient for the XSLT
processor.  You need to research the APIs that are supported by your XSLT
processor and determine what's the most efficient way of supplying a source
document and see whether it's an option to directly output result documents in
that manner, or to convert them to that format without doing a full
serialization and reparse.

In the Java world, the advice is to use SAX because it is fast and it is
natively supported by all the processors that support JAXP -- the processor
can be given a source document via SAX event calls and can generate SAX event
calls for the result.

In the Python world, SAX is not necessarily the most efficient. For example,
4Suite uses Expat to do parsing of serialized XML, and it builds Domlette
documents from Expat's native callbacks (which are somewhat SAX-like, but
different). It's more efficient to supply a Domlette to the processor than it
is to supply an unparsed document or even Expat callbacks. The processor does
support SAX, and Domlette (as Result Tree Fragment) output, though, so we
could perhaps write a SAX-to-Expat layer for use in conjunction with the SAX
XSLT output writer, or we could write an Expat XSLT output writer, but we're
better off just using our Result Tree Fragment writer, which generates
Domlette nodes that can be fed directly to the next transformation instance.

We don't yet have a good chaining API or recipe for 4Suite in general, and in
researching our capabilities in order to answer this question, Jeremy & I
found some bugs that have since been fixed in CVS. The code sample below is an
example that should work with a current CVS snapshot, and is pretty fast,
although Jeremy points out that Processor re-use is not thoroughly tested and
the overhead of creating a new Processor instance is minimal in comparison to
going through all the things that happen when the Processor.reset() is called.

-Mike


# Just some experimentation...
#
# Don't take this as being the one and only way to do chaining in 4Suite
# (outside of the repository).
# The point is just to demonstrate using different output writers, and to
# give us some food for thought on making a less cumbersome chaining API.
#

src_xml = """<?xml version="1.0" encoding="utf-8"?>
<quote>This cursed hand,
for thicker than itself with brother's blood --
Is there not rain enough in the sweet heavens
to wash it white as snow?
</quote>"""

# A 6-letter rotation of the lowercase characters
#
xslt1 = """<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="xml" indent="no"/>
  <xsl:template match="quote">
    <xsl:copy>
      <xsl:value-of select="translate(.,'abcdefghijklmnopqrstuvwxyz',
                                        'ghijklmnopqrstuvwxyzabcdef')"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>"""

# A 2-letter rotation + uppercasing of the lowercase characters
#
xslt2 = """<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="xml" indent="no"/>
  <xsl:template match="quote">
    <xsl:copy>
      <xsl:value-of select="translate(.,'abcdefghijklmnopqrstuvwxyz',
                                        'CDEFGHIJKLMNOPQRSTUVWXYZAB')"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>"""


from Ft.Xml import InputSource, Domlette
from Ft.Xml.Xslt import Processor, RtfWriter

class Test:
    # we're going to try to reuse the processor
    p = Processor.Processor()

    def run(self, src_isrc, chain):
        i = 0
        if not chain:
          return ''
        for (sty, uri) in chain:
            sty_isrc = InputSource.DefaultFactory.fromString(sty, uri)
            self.p.appendStylesheet(sty_isrc)
            # not on last stylesheet in chain?
            if i < len(chain) - 1:
                # use an RtfWriter
                w = RtfWriter.RtfWriter(None, 'urn:temp.xml')
                # not on first stylesheet in chain?
                if i:
                    # use last RtfWriter's buffer as source doc
                    self.p.execute(result, src_isrc, writer=w)
                else:
                    # use original source doc
                    self.p.run(src_isrc, writer=w)
                # save result to use as source doc next time
                result = w.getResult()
            # last stylesheet in chain
            else:
                if w:
                    result = self.p.execute(result, src_isrc)
                else:
                    result = self.p.run(src_isrc)
            self.p.reset()
            i += 1
        return result


xml_isrc = InputSource.DefaultFactory.fromString(src_xml, 'urn:hamlet.xml')

# four 6-letter rotations + a 2-letter rotation and uppercasing
# should result in a full rotation and uppercasing...
# expected output is an uppercase version of the Hamlet quotation
#
chain = [(xslt1, 'urn:lc-rot6.xsl'),
         (xslt1, 'urn:lc-rot6.xsl'),
         (xslt1, 'urn:lc-rot6.xsl'),
         (xslt1, 'urn:lc-rot6.xsl'),
         (xslt2, 'urn:lc-rot2-uc.xsl'),
        ]

t = Test()
print t.run(xml_isrc, chain)



More information about the XML-SIG mailing list