[XML-SIG] [ pyxml-Bugs-474708 ] Unicode 'junk' bug
noreply@sourceforge.net
noreply@sourceforge.net
Wed, 24 Oct 2001 18:46:22 -0700
Bugs item #474708, was opened at 2001-10-24 18:46
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=474708&group_id=6473
Category: expat
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: Unicode 'junk' bug
Initial Comment:
Expat will not properly parse an XML that contains
UTF-8 encoded unicode. Anything more than one pair
of XML tags will result in 'junk after document
element' on the second element.
<?xml version="1.0" encoding="UTF-8"?>
<text>ascii</text>
<text>
ã~A~Bã~A~Dã~A~Fã~A~Hã~A~Jã~A~Kã~A~Mã~A~Oã~A~Qã~A~Sã~A
~Bã~A~Dã~A~Fã~A~Hã~A~Jã~A~Kã~A~Mã~A~Oã~A~Qã~A~Sã~A~Bã~
A~Dã~A~Fã~A~Hã~A~Jã~A~Kã~A~Mã~A~Oã~A~Qã~A~Sã~A~Bã~A~Dã
~A~Fã~A~Hã~A~Jã~A~Kã~A~Mã~A~Oã~A~Qã~A~Så½±ã~@~@
Above is some Japanese text.</text>
above file results in
xml.parsers.expat.ExpatError: junk after document
element: line 3, column 0
whereas removing the first <text></text> pair will
result in a properly parsed file.
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=474708&group_id=6473