[XML-SIG] unicode, latin-1 and DOM...
Uche Ogbuji
uche.ogbuji@fourthought.com
Thu, 28 Jun 2001 07:19:00 -0600
> Hello everyone,
> =
> I'm struggling with unicode and stuff (so expect some mails in the comi=
ng
> days). Here's the first one. I'm aware that the XML document being pars=
ed
> in not correct (no encoding header), bug I'm surprised by the resut I g=
et: =
> =
> >>> from xml.dom.ext.reader import Sax2
> >>> d =3D Sax2.FromXml('<d>=E9t=E9</d>')
> >>> from xml.dom.ext import PrettyPrint
> >>> PrettyPrint(d)
> <?xml version=3D'1.0' encoding=3D'UTF-8'?>
> <!DOCTYPE d>
> <d/>
> >>> d.documentElement
> <Element Node at 81b14c4: Name=3D'd' with 0 attributes and 0 children>
> =
> I'm using python 2.1 the cvs version of PyXML with 4Suite 0.11.1b2. =
> =
> I would have expected a parse error when the latin-1 characters where
> encountered, and not a silent failure to create the Text node.
The parser is probably blowing up, and 4DOM's improperly masking the erro=
r.
Or maybe not. pDomlette shows the same problem
>>> from Ft.Lib.pDomlette import PyExpatReader
>>> reader =3D PyExpatReader() =
>>> doc =3D reader.fromString('<d>=E9t=E9</d>')
>>> doc.documentElement
<Domlette Element Node at 81e4c64: name=3D'd' with 0 attributes and 0 chi=
ldren>
>>> =
I'll have a quick look.
Note: you shouldn't be using the deprecated Sax2 "From*" functions.
-- =
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com =
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
XML strategy, XML tools (http://4Suite.org), knowledge management