[XML-SIG] [ pyxml-Bugs-1365605 ] PrettyPrint with UTF-16 encoding produces invalid XML.

SourceForge.net noreply at sourceforge.net
Thu Nov 24 15:43:12 CET 2005


Bugs item #1365605, was opened at 2005-11-24 14:43
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=106473&aid=1365605&group_id=6473

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: MikeW (mrdubya)
Assigned to: Nobody/Anonymous (nobody)
Summary: PrettyPrint with UTF-16 encoding produces invalid XML.

Initial Comment:
Every string written to the output stream by
PrettyPrint() has the default BOM prefixed to it. 
Using the UTF-16LE encoding stops any BOM from
appearing in the output stream, including before the
XML declaration.  This means that the resultant XML is
not valid (section 4.3.3 XML 1.0 3e).

Element attribute output is also invalid.  The =" and
final " are not encoded so end up pairing up with bytes
from the BOM codepoint.  Using UTF-16LE does not
correct the output for similar reasons.

I assume UTF-32 output will suffer from the similar
problems.

PyXML version 0.8.4 and Python version 2.4.1

Example code:

dom = Document.Document(None)
element = dom.createElement('HelloWorld')
element.setAttribute('Language', 'English')
dom.appendChild(element)
PrettyPrint(dom, stream=file('./pp.xml', 'w+b'),
encoding='UTF-16')


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=106473&aid=1365605&group_id=6473


More information about the XML-SIG mailing list