[XML-SIG] sgmllib has problems with dots in tag names

Fred L. Drake Fred L. Drake, Jr." <fdrake@acm.org
Fri, 16 Jul 1999 09:17:03 -0400 (EDT)

Andreas Jung writes:
 > The SGML parsers from the standard sgmllib and the XML sgmllib war both
 > unable to parse SGML tags with dots in the tag name like <TI.DOC>. The 
 > parsers callback functions only get the first part of the tag name (before
 > the dot) as argument (in this case 'TI'). Because the tags are valid SGML
 > tags this is a bit annoying. Ok, one could get a workaround by replacing
 > all dots in tags with an underscore however that's not a clean solution :-)

  Ok, I've poked at the standard sgmllib a bit to see what the problem
is.  The parser is recognizing the start and end tags.  Once
recognized, it is looking for the handler methods start_*() / end_*()
or do_*().  Since there's a dot in the name, these methods are not
defined, and the unknown_*tag() methods are called instead of the
handle_*tag() methods.
  It should be easy to override the unknown_*tag() methods to use a
table-based dispatcher or performs some form or name mangling, then
passes known tags through to the handle_*tag() methods or whatever.
This seems to be the easiest way to deal with the situation in the
short term.
  If you have any suggestions for a better approach to take, I'd love
to hear it.  It may not be unreasonable to use a mechanism similar to
that used by xmllib (a table of registered handler methods).


Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives