[XML-SIG] Efficient Namespace Handling

Lars Marius Garshol larsga@garshol.priv.no
30 Jun 2000 23:46:16 +0200


* Paul Prescod
| 
| In namespace mode PyExpat should produce tuples: (URI, localname,
| rawname)
| 
| Those should be passed as the "name" parameter to SAX event handlers of
| the form:
| 
| def startElement((URI,localname,rawname), attrs):...
| 	...

I think this is very wrong, and that we shouldn't do it. I've looked
through the SAX code in the Python CVS tree and this is the only thing
I'm really unhappy about.

Let me enumerate the reasons:

 - the rawname is not really part of the element name, so it does not
   feel right to have it in the tuple like that

 - it makes name comparison (the most common operation on names!)
   really ugly:

     if name[0] == xslt_ns and \
        name[1] == 'template':
        # do something useful

   compared to:

     if name == xslt_template:

   or:
 
     if name == (xslt_ns, 'template'):

 - dispatching also becomes ugly:

     self._element_handlers[(name[0], name[1])] ()

   compared to:

     self._element_handlers[name] ()

 - it makes it harder to make people understand that the prefix used
   in the XML document is not part of the element name

 - I don't see anything that is made easier or faster by this
   representation

 - we already discussed this and decided for something else; I think
   we should not change our minds without good reason


The one thing I like about the representation is that it simplifies
the Attributes interface, which is the weak point of SAX 2.0 as it
stands. See below for more on that.
 
| Those can in turn be passed to the DOM createElement method which
| can check the type of its first parameter and "do the right thing"
| when it is a tuple. This is more efficient than the DOM's
| createElementNS method which requires string manipulations.
 
I don't fully understand this argument. What benefits do you see over
the (uri, localname), qname representation? Also, don't you think any
speed gains will be lost here in the cost from all the method calls
and object instantiations?

| The second issue is an efficient way to pass around attributes. Note
| that I am not talking about how to query or fetch attributes. Just
| how to pass them around. The obvious representation for an attribute
| value is ((URI,localname,rawname),value). Tuples and list are much,
| much faster to create than instances in Python. Java doesn't really
| have equivalents. I propose that in beta 2 of Python, PyExpat in
| namespace mode should pass list structures of the form:
| 
| [((URI,localname,rawname),value),...] 
| 
| I choose not to use dictionaries because it isn't clear whether to
| index on rawnames or localname/URI tuples. It depends on the
| application so it is better to build dictionary-based indexes at the
| application level.

I am very unhappy about this. While SAX makes few concessions to
convenience I see little reason to be downright obfuscatory. I was
really concerned about the speed impact of creating an AttributesImpl
object per startElement invocation too. 

However, after benchmarking the speed of a real application
(Shakespeare to HTML converter) which created a new instance for each
call against one that recycled the old instance (but updated the
internal attributes) I gave this up. I think the speed increase was
from 99 to 96 seconds for converting all of Shakespeare's plays.

In other words: not worth the cost, in my opinion. Most likely you
will get greater speedups by replacing the current

  xmlreader.AttributesImpl(...)

with

  AttributesImpl(...)

than from this change...

I'm sorry to write an email that may seem so harsh, but I am really
convinced that what you are proposing is really really wrong and that
it is very important that we don't do this. The end result will, I
think, be to make SAX a real pain to use, and I think the speed
benefit is likely to be less than 5% for a reasonable application.

Hoping this is not too late...

--Lars M.