[XML-SIG] parsers and XML

Paul Prescod paul@prescod.net
Wed, 16 Aug 2000 00:13:16 -0400

travish wrote:
> Hi... I was taking a look at some of the docs, code, and examples,
> and was a bit surprised about a number of things.  Below are some
> comments, problems, diffs, etc.  You may already know some of this.
> a) most of the XML "parsers" act appear to be lexers
> b) none of the examples are of sufficient/substantial complexity
>    (e.g. recursive nesting, deep/complex hierarchy)
>    If anyone has suggestions on what kind of parser to use as a back end
>    (yapps?  kjParsing?  etc.) I'd be interested to hear it.

Let's say we divide it this way:

 * a lexer is a tool that recognizes lexical boundaries (usually
boundaries are described as regular expressions)

 * parser is a tool that organizes the stream of lexical events into a
*logical* tree structure. They may or may not generate an AST but they
will at least call your methods in a tree nested fashion.

Well, all XML parsers I know of do both functions. It might seem at
first that the "parse" part of the task is trivial for XML but it isn't
so if you consider entities.

Obviously you expect more from a parser than just building your logical
tree. If you state exactly what you are looking for we might be able to
point you to it or develop it. Note that a lot of people with very
serious parser theory backgrounds have worked with XML so the
relationship with formal parsing theory is pretty well understood.
 Paul Prescod - Not encumbered by corporate consensus
"I don't want you to describe to me -- not ever -- what you were doing
to that poor boy to make him sound like that; but if you ever do it
again, please cover his mouth with your hand," Grandmother said.
	-- John Irving, "A Prayer for Owen Meany"