[Patches] [ python-Patches-1704134 ] minidom Level 1 DOM compliance
noreply at sourceforge.net
Mon Apr 23 15:40:24 CEST 2007
Patches item #1704134, was opened at 2007-04-20 03:39
Message generated for change (Comment added) made by jorend
You can respond by visiting:
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Group: Python 2.6
Submitted By: Jason Orendorff (jorend)
Assigned to: Nobody/Anonymous (nobody)
Summary: minidom Level 1 DOM compliance
Tested on: Mac OS X 10.4.9
This patch fixes numerous bugs in xml.dom.minidom and expatbuilder.
It fixes all the small-to-middling bugs in minidom's DOM Level 1
compliance that I'm aware of; only large bugs remain (see below).
Changes: These are mainly fixes for bugs found by the W3C DOM Test
Suite for DOM Level 1. Python 2.5 fails over 120 of these tests; I
got the number down to 48.
- Exposed expat's XML_GetSpecifiedAttributeCount() as a method of
pyexpat parser objects. (This is needed to set Attr.specified
correctly.) Documented the new method in Doc/lib/libpyexpat.tex.
- Attributes that got default values from the DTD didn't show up in
the DOM. (This was a violation of the XML 1.0 spec! See
http://www.w3.org/TR/REC-xml/#proc-types section 5.1, "Validating
and Non-Validating Processors". Even non-validating processors
must "supply default attribute values" based on a certain subset
of the DTD!)
- Attr.specified is now set correctly. Before, it was always False
(should have been True).
- Inserting a node into one of its descendants caused an infinite
loop! :-) Now it throws HierarchyMalarkey, per the spec.
- Many error conditions specified in the DOM were not detected. In
particular, InvalidCharacterErr was never raised. The new version
does a lot more checking.
- Assigning to nodeValue is now a no-op for node types where it's
defined to be null.
- Document.createEntityReference() is implemented. It returns an
EntityReference node, but the node is not populated from the DTD.
(That is, the new EntityReference implementation is compliant as
far as it goes, but incomplete.)
- Element.removeAttributeNode(attr) now raises NotFoundErr if
attr belongs to some other Element and merely has the same name as
an attribute of this Element.
- Element.setAttributeNode() would sometimes return None erroneously.
- Element.removeAttributeNode() now returns the removed node.
- Several CharacterData methods would incorrectly throw if you
passed node.length as the index.
- Added Document.xmlVersion (from DOM Level 3). This affects
INVALID_CHARACTER_ERR checking as specified.
- Added tests for all of the above.
- Removed trailing whitespace from lines in Lib/test/test_minidom.py.
- Deleted obsolete gc testing from test_minidom.
- In one or two places, broke very large asserts into many small
asserts. (I was debugging something. This change is inessential,
but it's a good change, so I kept it.)
DOM Level 1 bugs remaining:
- A lot of the readonly properties are not implemented as readonly.
This would be easy to fix with new-style classes, but these are
old-style classes that are using property() for a few things--I
haven't tried to understand it yet. I'm putting this off until
the present patch lands.
- All NodeLists should be live views, even the one returned by
getElementsByName(). It will be hard to fix this while retaining
pickle backward compatibility, and still harder to do it without
- Attribute nodes' nodeValue and childNodes are supposed to stay in
sync. This has all the same problems.
- EntityReference nodes should be populated with child nodes. The
descendants of EntityReference nodes should be readonly. This is
slightly less of a headache.
I haven't even tried to run DOMTS level2 tests yet. I'm sure it'll be
>Comment By: Jason Orendorff (jorend)
Date: 2007-04-23 09:40
Logged In: YES
Justification for the most significant changes:
1. I added a __setattr__() method to class xml.dom.minidom.Node. This
means existing code that subclasses any of the Node classes *and* overrides
__setattr__() *and* isn't calling the base class __setattr__... will not
get the new __setattr__ functionality, which basically implements a
DOM-compatibility quirk. I think it's OK. :)
2. There's a flag in pyexpat that lets you turn off default attribute
values. The flag is marked "use with caution", because turning this off is
not XML-compliant. expatbuilder was using the flag. I can't tell why.
There was no comment in the code. I think it was just a mistake; I changed
it, and minidom passes more DOMTS tests as a result.
3. To implement Attr.specified, I had to expose another Expat API via
pyexpat. This was actually a pretty minor change, but I mention it because
I'm sure anyone opening this patch will be shocked that it touches any C
4. Added some big regexes to check for non-XML-compliant names and raise
InvalidCharacterErr. I didn't see a better way to do this. If anyone is
worried about the performance hit from doing these checks, I'll look at
You can respond by visiting:
More information about the Patches