[XML-SIG] dumping an XML parser skeleton from DTD input

Eugene.Leitl@lrz.uni-muenchen.de Eugene.Leitl@lrz.uni-muenchen.de
Sat, 10 Mar 2001 16:41:09 +0100


"Thomas B. Passin" wrote:

> You are mixing up several concepts or processing steps.

I realize that. It comes from being a newbie with a deadline
breathing down my neck.
 
> 1) Parsing  xml.
> This means to get hold of the structural elements of the xml document and give
> them to another application for further processing.  There are many xml
> parsers out there, come command line and some not.  It's almost certainly not
> worth it to roll your own.

I know that, but apparently not my senior cow-orkers. It's a C/C++ shop
with an occasional sprinking of Java, my choice of Python is purely
personal (note to myself: not to goof up this one).
 
Before I try selling them on the DOM thing, I'd rather know what I'm
doing. It cost them three days to whip up their object tree XML parser
in Java.

> 2) Creating a tree-like structure to represent the structure of the xml
> document.
> The DOM is an API for a tree-like representation.  Most major parsers out
> there either include a DOM api or can work with another DOM API.  (SAX is a
> non-DOM api, but the output of a sax processsor can be used to build a tree,
> too).  The DOM is an object oriented api.

They (said cow-orkers) insist on an object tree based approach.
 
> 3) DOM manipulation, using the DOM api. There are already good processors that
> can use the DOM api to manipulate and actual, populated DOM trees.  So don't
> roll your own there, either.

Does http://4suite.org/download.epy fill the ticket? The regression tests of it
dumped core on me at work, let's see whether I can get it running at home.
 
> 4) You don't need a DTD, but it's a good idea to make one anyway because then
> you can use a validating parser to check that the first xml examples that you
> build are "valid" - i.e., put together correctly from a structural point of
> view.  It's amazing how easy it is to accidently create something else besides
> what you thought you were making.

I think Emacs psgml mode will take care of that.
 
> Otherwise, you can start simple with no DTD and later define one after you
> have some hands-on experience working with xml.
> 
> As Martin said, the  Python PyXML package is very good.  There's also the

Downloading it now.

> Microsoft xml processor, which can be written to as a COM object, in VBscript,
> or in Javascript.  There are several good java processors, and some good Perl
> ones.  Python would be the quickest and easiest to use, especially if you are
> not already up to speed in one of the other languages.  Even if you are,
> Python will be faster and easier to use than one of the strongly typed
> compiled languages like java.
> 
> Get a good book or two, like Wrox's Professional XML and XML in a Nutshell
> from O'Reilly, to mention only two of the good ones out there.

I've gotten me Learning XML from ORA, which was a fresh wind in comparision to
SGML & XML Cookbook.

> Yes, the wheel has already been invented.  But core dumps aren't going to be
> very useful.  Do examples from a book or tutorial site, fix them til they run
> right, then start morphing them closer to what you want to do.  You don't need
> to try to understand a DOM tree from a core dump.  Learn about the api

The 4Suite DOM package dumped core on me when I was running regression tests as
part of the build. Perhaps I should try sticking with PyXML at first.

> instead.

Thanks for all the good advice.