[XML-SIG] sgmllib has problems with dots in tag names
Fred L. Drake
Fred L. Drake, Jr." <fdrake@acm.org
Fri, 16 Jul 1999 09:17:03 -0400 (EDT)
Andreas Jung writes:
> The SGML parsers from the standard sgmllib and the XML sgmllib war both
> unable to parse SGML tags with dots in the tag name like <TI.DOC>. The
> parsers callback functions only get the first part of the tag name (before
> the dot) as argument (in this case 'TI'). Because the tags are valid SGML
> tags this is a bit annoying. Ok, one could get a workaround by replacing
> all dots in tags with an underscore however that's not a clean solution :-)
Andreas,
Ok, I've poked at the standard sgmllib a bit to see what the problem
is. The parser is recognizing the start and end tags. Once
recognized, it is looking for the handler methods start_*() / end_*()
or do_*(). Since there's a dot in the name, these methods are not
defined, and the unknown_*tag() methods are called instead of the
handle_*tag() methods.
It should be easy to override the unknown_*tag() methods to use a
table-based dispatcher or performs some form or name mangling, then
passes known tags through to the handle_*tag() methods or whatever.
This seems to be the easiest way to deal with the situation in the
short term.
If you have any suggestions for a better approach to take, I'd love
to hear it. It may not be unreasonable to use a mechanism similar to
that used by xmllib (a table of registered handler methods).
-Fred
--
Fred L. Drake, Jr. <fdrake@acm.org>
Corporation for National Research Initiatives