[XML-SIG] easySAX

Lars Marius Garshol larsga@ifi.uio.no
21 May 1999 20:51:43 +0200


* Lars Marius Garshol 
| 
| What do people think? Is this better than adding the suggested
| improvements to the SAX core? This was just hacked together in 15
| minutes, so please don't hesitate to slaughter it if you don't like
| it.

* Paul Prescod
| 
| I'm not thrilled with the fact that it requires an explicit adapter
| instead of a simple base class. 

Hmmm. Why do you see this as a disadvantage? Part of the reason I did
it as I did is that I want the user to be able to redefine
startElement and endElement without messing up the framework. Knowing
why you don't like the adapater would make the tradeoff easier.

| My counter-proposal is that easySax be a base class that defines
| startElement, endElement and characters.
| 
| easySax "clients" would define start_Foo, end_Foo,..., startUnknown,
| endUnknown processingInstruction and "text", where text is defined
| as a Python programmer would expect: as a simple string without the
| index junk.

Hmmm. I feel uneasy about the *Uknown methods, but I suppose they will
have their uses.
 
| What you do with captured text is highly context specific. What if
| we had TITLE_text, BODY_text, FOO_text and Unknowntext. Then if
| Unknowntext isn't defined we wouldn't be storing away little useless
| text snippets all of the time (e.g. if we were just looking for
| titles).

Paul, thanks you! I think this is the idea I've been looking for ever
since I started thinking about making something like easySAX. If we
pass in attributes as well here it means that for the small leaf
elements (which in data-oriented XML are usually the important ones)
you have all the information you need in one callback.

It would also mean that passing an unsliced strings to characters in
the real SAX probably will pay off, since as you say we will now only
slice the strings when we actually need them.

(Except I think I prefer text_TITLE and textUnknown.)

I'll give the interface another rotation and then post it again. (More
comments are of course very welcome.)

--Lars M.