[XML-SIG] 4DOM HTML erroneously using NS methods

Jeremy Kloth jeremy.kloth@fourthought.com
Thu, 17 Jan 2002 11:40:04 -0700


From: "Sylvain Thenault" <Sylvain.Thenault@logilab.fr>
> On Wed, 16 Jan 2002, Uche Ogbuji wrote:
>
> > In dom.ext.reader.Sgmlop.py and other places, The NS methods are
erroneously
> > being used for HTML DOM manipulation.  This breaks apps that are
expecting the
> > resulting nodes to have the mandated HTML DOM behavior (i.e.
> > uppercase-normalized tagnames and the like).  In fact:
> >
> > $ grep NS\( dom/html/*
> > grep: dom/html/CVS: Is a directory
> > dom/html/HTMLDocument.py:    def createElementNS(self, namespace,
qname):
> > dom/html/HTMLElement.py:
self.attributes.setNamedItemNS(clone)
> > $
> >
> > These shouyldn't even be defined for HTML interfaces.
>
> why ? Doesn't HTMLDocument inherit from the standard DOM level 2 Document?
>
> > The problem is that it would be a bit of a chore to carefully change
things
> > back so that the XML modules strictly use NS methods, and the HTML
modules the
> > non-NS modules.  It would probably be easiest for whomever made the
changes to
> > undo them.
>
> I made the changes in dom/ext/reader/Sgmlop.py and
> dom/html/HTMLDocument.py. Those one can easily be reverted, but I think
> it'll be much longer to be sure that all XML modules strictly use
> NS methods and HTML modules strictly use non NS methods.
>

All the convience methods defined on HTML elements are expecting the
attributes to stored via setAttribute not the NS verision.  It would be
quite an udertaking to change all of those.  Another problem is that
attribute names are not uppercased as element names are.

>  Originally, I made the change in Sgmlop to be able to give to Sgmlop a
> pDomlette document (which doesn't implement the non NS methods).
>  Then I have overridden the createElementNS method in HTMLDocument to
> delegate in the same manner as createElement, so I don't think it break
> the standard HTML DOM behavior if you give to HTMLParser a HTMLDocument.
>

Elements generally work either way, at least when using the null-namespace.
 However, the big offender is attributes.  Here is an example HTML document
that currently will not parse:

<html xml:lang="en">
  <head><title>Broken</title></head>
  <body>HTML doesn't care about prefixes!</body>
</html>

Now that said, the HTML DOM implementation was written from the November
2000 working draft (before XHTML), however the newest draft, December 2001,
addresses XHTML and the ability to work with either.  So either we can
update all of the HTML interfaces to follow the current WD (which it is
breaking by uppercasing the names at construction, not access time) or
revert the code to follow the older draft.  But I think it is just wrong to
be half and half and rather confusing.

--
Jeremy Kloth                              Consultant
jeremy.kloth@fourthought.com              +1 303 583 9900 x 105
Fourthought, Inc.                         http://fourthought.com
4735 East Walnut St, Boulder, CO 80301-2537, USA
XML strategy, XML tools (http://4suite.org), knowledge management