[XML-SIG] sgmllib has problems with dots in tag names

Fredrik Lundh fredrik at pythonware.com
Fri Jul 16 10:05:11 EDT 1999


Andreas Jung <ajung at sz-sb.de> wrote:
> On Fri, Jul 16, 1999 at 09:17:03AM -0400, Fred L. Drake wrote:
> > 
> >   Ok, I've poked at the standard sgmllib a bit to see what the problem
> > is.  The parser is recognizing the start and end tags.  Once
> > recognized, it is looking for the handler methods start_*() / end_*()
> > or do_*().  Since there's a dot in the name, these methods are not
> > defined, and the unknown_*tag() methods are called instead of the
> > handle_*tag() methods.
> 
> 
> >   It should be easy to override the unknown_*tag() methods to use a
> > table-based dispatcher or performs some form or name mangling, then
> > passes known tags through to the handle_*tag() methods or whatever.
> > This seems to be the easiest way to deal with the situation in the
> > short term.
> 
> That's a solution that works with the standard sgmllib from the Python
> distribution. However this solution does not work with sgmllib from 
> xml.parsers. I'm not sure if tag names with dots in their names are valid in XML 
> or not. So this might explain the different behaviour however I don't think
> that's the reason. Maybe I'll find the real reason over the weekend.

iirc, the sgmllib in xml.parsers is an extended version
which uses the sgmlop accelerator, if installed.  it's
based on 1.5.1's standard sgmllib, which did not accept
dots in tag names.  I suppose nobody's gotten around to
fix it...

(btw, the sgmlop accelerator recognizes dots just fine,
so one way to get around this problem is to install the
accelerator...)

</F>





More information about the Python-list mailing list