[XML-SIG] Anybody using PyXML (4DOM) HTML DOM?
John J Lee
jjl at pobox.com
Mon Aug 25 18:33:05 EDT 2003
Does anybody use PyXML's (4DOM's) HTML DOM implementation (including the
implementors themselves)?
A couple of examples where it looks clearly broken, which makes me suspect
nobody but me is using it:
1. HTMLDocument.getElementsByTagName doesn't work at all for lower-case
attribute values (SF bug 782470):
#!/usr/bin/env python
from xml.dom.ext.reader import HtmlLib
doc = HtmlLib.FromHtml("""<html><head><title></title></head><body>
<form name="blah"></form>
</body></html>""")
# HTMLElement.getAttribute uppercases the name, but it was *stored*
# in lower case, so both fail.
print repr(doc.getElementsByName("blah"))
print repr(doc.getElementsByName("BLAH"))
I don't know how this should be fixed: case issues in HTML DOM seem
horribly complicated.
2. HTMLInputElement._get_type capitalisation is wrong.
xml/dom/html/HTMLInputElement.py says:
| def _get_type(self):
| return string.capitalize(self.getAttribute('TYPE'))
HTML DOM level 2 spec says:
| The type of control created (all lower case). See the type attribute
^^^^^^^^^^^^^^
| definition in HTML 4.01.
John
More information about the XML-SIG
mailing list