[python-win32] Properly encoded HTML from MSXML XLST processor into python string (via IStream) ?
Andreas Neubauer
aneubauer at ra.rockwell.com
Mon Sep 10 21:23:57 CEST 2007
Dear all,
Using the Microsoft XML core services (MSXML 4.0) as an XSLT-processor for
python
i got into a trap when trying to generate properly unicode(UTF-8) encoded
HTML:
The encoding statement gets lost in the HTML header, and white-spaces
UTF-8: HEX code C2 A0 convert to A0.
Testing and reading the Microsoft doku I found this working fine if the
target output is of type IStream ...
Can I somehow use a Microsoft IStream object or implement it in a suitable
manner ?
The XSLT stylesheet controls it like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" >
<xsl:output version="1.0" method="html" indent="no"
encoding="UTF-8"/>
However this output configuration statement gets ignored if the
IXSLProcessor is not used with a custom output (e.g. IStream object).
Microsoft: "When a new transform is started, the processor will use a
QueryInterface this output for IStream. When the transform is complete or
reset is called, IStream is released. The only method that is used on
IStream is Write. The bytes written to the stream will be encoded
according to the encoding attribute on the <xsl:output> element.
If you do not provide a custom output, then you will get a string when you
read this property. The string contains the incrementally buffered
transformation result.
Reading this property has the side effect of resetting that internal
buffer so that each time you read the property you get the next chunk of
output. In this case, the output is always generated in the Unicode
encoding, and the encoding attribute on the <xsl:output> element is
ignored."
Like in this python example:
-----------------------------------------------------------------------------------
_msxmlLib =
win32com.client.gencache.EnsureModule("{F5078F18-C551-11D3-89B9-0000F81FE221}",
0, 4, 0)
...
xslt = win32com.client.dynamic.Dispatch("Msxml2.XSLTemplate.4.0")
...
xslProc = xslt.createProcessor();
xslProc.input = xmlDoc
xslProc.transform()
xmlData=xslProc.output
-----------------------------------------------------------------------------------
If I use a custom output like this:
-----------------------------------------------------------------------------------
xmlData=""
xslProc.transform()
xslProc.output(xmlData)
-----------------------------------------------------------------------------------
MSXML com object returns an error:
" Exception during xslt transformation: 'unicode' object is not callable
"
Is there a way in python to provide an appropriate unicode object
receiving the output without ignoring the encoding statement ?
Any proven way using IXSLProcessor generating properly encoded HTML into
python ?
Kind regards
Andreas Neubauer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-win32/attachments/20070910/f5e2c702/attachment.htm
More information about the python-win32
mailing list