[XML-SIG] [ pyxml-Bugs-474708 ] Unicode 'junk' bug

noreply@sourceforge.net noreply@sourceforge.net
Wed, 24 Oct 2001 18:46:22 -0700


Bugs item #474708, was opened at 2001-10-24 18:46
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=474708&group_id=6473

Category: expat
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unicode 'junk' bug

Initial Comment:
Expat will not properly parse an XML that contains 
UTF-8 encoded unicode.  Anything more than one pair 
of XML tags will result in 'junk after document 
element' on the second element.

<?xml version="1.0" encoding="UTF-8"?>
<text>ascii</text>
<text>
 ã~A~Bã~A~Dã~A~Fã~A~Hã~A~Jã~A~Kã~A~Mã~A~Oã~A~Qã~A~Sã~A
~Bã~A~Dã~A~Fã~A~Hã~A~Jã~A~Kã~A~Mã~A~Oã~A~Qã~A~Sã~A~Bã~
A~Dã~A~Fã~A~Hã~A~Jã~A~Kã~A~Mã~A~Oã~A~Qã~A~Sã~A~Bã~A~Dã
~A~Fã~A~Hã~A~Jã~A~Kã~A~Mã~A~Oã~A~Qã~A~Så½±ã~@~@
Above is some Japanese text.</text>

above file results in 
xml.parsers.expat.ExpatError: junk after document 
element: line 3, column 0
whereas removing the first <text></text> pair will 
result in a properly parsed file.


----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=474708&group_id=6473