[XML-SIG] SAX Namespaces

Fred L. Drake, Jr. fdrake@beopen.com
Thu, 6 Jul 2000 12:45:22 -0400 (EDT)


Greg Stein wrote:
 >   iv)  {(URI, localname) : (qname, value), ...}
 > 
 > Using (iv) means that the passed attribute dictionary is immediately usable.
 > The other forms require some initial processing, yet provide no value-add.

Paul Prescod writes:
 > It is immediately usable as a dictionary, but it must be converted to a
 > list for apps that want to iterate over attributes. Examples include
...
 > 1. Many (most?) apps turn the dictionary into a list immediately.

  An unexpected observation!  When we were working on Grail, the lists
of attributes returned by sgmllib/htmllib were a substantial
nuissance, and we *really* wanted dictionaries.  The problem of
looping over the attributes to get the ones we wanted was sufficient
to fork the modules from the standard library and create the code
that's in the later versions of Grail (see the grail/src/sgml
directory in the Grail CVS tree at SourceForge), which was *much*
easier to work with.
  Perhaps there's a split here between general tools that work on
arbitrary XML and "applications" that don't care about the XML but
only need to extract the information to solve some specific problem?
That actually seems fairly likely to me, on first thought.

 > 3. Dictionary building and populating is more expensive than list
 > building.

  But still trivial compared to actually doing anything interesting
with the attribute values.

 > 4. Attribute lists are typically so small (two or three items) that it
 > is debatable whether a hashtable is the right index structure for them
 > anyhow. Maybe linear search is better for a lot of apps. Maybe "lazy"
 > indexing is better. I'd rather leave it up to the app.

  Whether this is the most efficient structure is only part of it --
the usage pattern we observed in Grail was that we'd set up default
values in locals, loop over the attributes list to set up locals, and
then use the locals while doing whatever we needed to do.  It was a
real pain if we needed to branch on one attribute and then only use
some others in one branch or another; we still had to loop and extract
first, and then do the application work.  We couldn't branch on the
one that mattered, and then get the others only as needed.  Unless we
looped more than once, which is heinous.

 > 5. Pyexpat delivers the attributes as a list. Python 1.7 might just wrap
 > the pyexpat data structure as a sequence rather than copying the
 > attributes out (admittedly, more research is needed...!)

  If this really is an efficiency problem, then perhaps creating a
highly efficient AttributeList implementation in C is worth the
effort, otherwise, something that allows random access to attributes
by name (such as the AttributeList) in Python is fine.  Lists of
attributes seem really hard to work with.


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at beopen.com>
BeOpen PythonLabs Team Member