[XML-SIG] SAX Namespaces

Paul Prescod paul@prescod.net
Thu, 06 Jul 2000 20:42:33 -0500


Greg Stein wrote:
> 
> ...
>
> > Minidom uses the same first line, and so do a bunch of our other sample
> > programs.
> 
> That is an improper basis for your claim. I do the .items() because qp_xml
> does the namespace processing itself. If Expat does the processing, then I
> would no longer need a lot of the work in qp_xml.Parser.start. And I
> certainly would never do the .items() any more.

If the interface you want to expose to your users is 

((uri,localname)->(rawname,value))

then qp_xml is rather unique in that regard. Attribute-using code in
such an environment is likely to be pretty ugly.

The DOM says that the referent value is either a node or a string, not a
rawname/value or prefix/value tuple. XPath, XPointer, XSLT say the same.
 
> It is easy to transform a dictionary to a list. The other direction is much
> harder.

Either is easy. It's just a question of which will happen more. My guess
is that most of the time we will transform Expat's list into a
dictionary and then convert that to a list (to iterate over it) and then
convert that back to a DOM-specific, Pyxie-specific etc. data structure.
Fred wants to prove me wrong by making a data structure we'll all like. 

You just want to say that the provided data structure is "good enough"
when it simply isn't enough for most APIs. I feel pretty confident that
((uri,localname):(rawname,value)) is not going to be a popular
representation in higher level APIs, and perhaps even among SAX
programmers.

> How could anybody do a lookup based on a qname? There is no way to know the
> prefix. 

I have megabytes of documents where I know the locations of every
line-feed. Prefixes are not that mysterious.

The W3C has decided that it is appropriate in specs "above XML" to query
and navigate based on the prefix even if namespace processing is turned
on. Even if we decided that that decision is questionable here, there is
nothing we can do about it. Minidom (for one) indexes on both qname and
uri/localname pair. The user may use this facility to blow their feet
off but they might also have good reason for doing so.

> If you're talking about the "xml:" prefix, then you also know the
> URI, so the lookup on a (URI, name) is a cakewalk.

Actually, it didn't occur to me until you mentioned it, but that isn't
true. The string you mention is not by definition bound to the xml
prefix, on the other hand the xml:* attributes are defined based on
their rawnames, not their URI.

> > 3. Dictionary building and populating is more expensive than list
> > building.
> 
> Eh? How is that?

That was what my tests from Python code showed but on further testing I
see that minor variations in the code can shift it around. In
particular, lists were slower if you use "append" instead of
precomputing the length of the list.

> And we are talking mostly about convenience for the Python programmer here.
> Shaving a few cycles of C code is moot w.r.t. what the Python result is.

There is no convenient built-in data structure. I certainly don't think
having "values" of (prefix,value) is convenient. Actually, the first
version of minidom code did something like that but the code was pretty
ugly. My point is that if we stick to Python's primitive data types then
copying the attributes out will be the rule, not the exception.

> We're talking about delivering the right semantic to the Python user. Expat
> doesn't have dictionaries, so it must deliver them that way. 
> We are under no requirement to match it exactly.

I didn't claim we were. I said that among other things, one benefit of
doing it this way is mirroring Expat.

> It should be the prefix, not the qname. But yes: it isn't as intuitive as it
> could be. But the (URI, name) key is definitely intuitive. It also stresses
> the simple fact that you can only have one key/value for a particular
> attribute. The semantics are a much better match.

Several specs in the XML family say that the right semantic is double
indexing. I need to support all of the specs in the family. Even
ignoring that, I don't believe that there is a single existing API that
uses the mapping structure you propose. We would have to copy the values
out or "wrap" if only to be backwards-compatible.

It wouldn't be the end of the world if I had to do a .items() for every
element, but I would be annoyed to find six months from now that most
apps are doing the items() in which case the list should have been the
data structure in the first case.

-- 
 Paul Prescod - Not encumbered by corporate consensus
Pop stars come and pop stars go, but amid all this change there is one
eternal truth: Whenever Bob Dylan writes a song about a guy, the guy is
guilty as sin.
	- http://www.nj.com/page1/ledger/e2efc7.html