[XML-SIG] (Py)DOM: Character References
Carsten Oberscheid
co@daisybytes.su.uunet.de
Thu, 18 Mar 1999 17:31:45 +0100
>
> Carsten Oberscheid writes:
> > Can anybody tell why character references are not modeled explicitely in
> > the
> > DOM? In XML they have their own identity, explicitely distinct from entity
> >
>
> Carsten,
> Good question. I don't know why character references need explicit
> nodes in the DOM; I'm not terribly interested in knowing that
> something was encoded as "+" or "+".
Ok, since charrefs encode only characters from the document's base character
set (Unicode for XML, ASCII for SGML -- is that right?), it would be
unnecessary overhead to create a distinct DOM node for each charref. Forget
that, should have thought before I wrote...
> I would like to be able to
> have this:
>
> <!DOCTYPE thing>
> <thing>&foo;</thing>
>
> provide a reference to &foo; as a child of the <thing> node. Here's
> what I get now:
>
> >>> buffer = '<!DOCTYPE thing>\n<thing>&foo;</thing>'
> >>> import xml.dom.utils
> >>> reader = xml.dom.utils.FileReader()
> >>> import cStringIO
> >>> sio = cStringIO.StringIO(buffer)
> >>> dom = reader.readStream(sio)
> >>> dom.documentElement
> <Element 'thing'>
> >>> len(dom.documentElement.childNodes)
> 0
That's ok (unless you have a DTD for doctype "thing" which declares "&foo;" --
in well-formed XML, only some default entities (&, <, >) are allowed
-- replace &foo; by & and it works.
>
> And here's a bug ;-) :
>
> >>> dom.documentElement.childNodes
> <NodeList]>
I'm not sure, but this could be caused by the last line of
xml.dom.core.SingleParentNodeList.__repr__(). I guess "-2" should be "-1"...
>
> -Fred
>
.co.
+------------------------------------------------------- daisy bytes! --------+
Carsten Oberscheid
co@daisybytes.su.uunet.de digital document processing
http://www.pweb.de/daisybytes.su electronic publishing