[XML-SIG] Proposal: command-line interface to parser
Matt G.
matt_g_@hotmail.com
Tue, 09 Jul 2002 07:54:37 +0000
>From: Uche Ogbuji <uche.ogbuji@fourthought.com>
>To: "Matt G." <matt_g_@hotmail.com>
>CC: xml-sig@python.org
>Subject: Re: [XML-SIG] Proposal: command-line interface to parser Date:
>Mon, 08 Jul 2002 22:15:18 -0600
>
> > A quick search (i.e. 'find PyXML-0.7.1 -perm +111') doesn't turn up any
> > general-purpose applications of the sort I'm looking for - sorry if it's
> > there and I missed it (but why not 'chmod +x' it?).
> >
> > Anyhow, I think it'd be immensely useful to include a command-line tool
>that
> > performs at least the following functions:
> >
> > * XML validation - returns a nonzero error code and
> > pretty/useful error message if validation fails
>
>The 4xml command in 4Suite CVS does this, except for the error code return,
>which is a good idea. Do you have some suggestions for good error codes to
>use?
I don't care about actual values, beyond zero == success and nonzero ==
failure. This is very important for writing scripts & makefiles. I even
have my prompt string configured to show me the return code of the last
command (but then I'm the kind of nut who has his username, pwd, machine
name, and the number of running and stopped jobs in his xterm titlebar).
Some of this should be fairly obvious, but here's my wish list, for return
code behaviors:
* nonzero should always be returned, if the input is not well-formed
* nonzero should be returned, if validation is enabled, and the document
fails to validate
* no output (i.e. XML written to either stdout or a file) should be
produced,
if the program executes with a nonzero error code. If an output file is
written, it should be deleted, before the program exits.
* a switch should exist, for treating warnings as errors. By default,
warnings should NOT cause the program to exit with a nonzero return code.
If the switch to treat them as errors is provided, they would cause the
program to (eventually) terminate, with a nonzero return code.
The point about not producing output is especially important, when used from
a Makefile. If this is not possible, then the exit message should probably
even say "bad output written to stdout", so that the user knows to make sure
that the output is cleaned up, if it's either redirected to a file or piped
into any other commands.
BTW, I assume all your options are 'getopt'-style (i.e. multi-letter options
begin with '--', while single letter, non-parameterized options use '-' and
can be combined).
I have a neat python module, built on top of getopt.py, that lets you
specify a short option, long option, and description. It handles '--help'
(though it gives you the opportunity to provide text to go before & after
the options summary). This allows you to centralize your management of
option listing & documentation, and could even tie into an automated system
to generate user documentation of your commandline interface. If you're
interested, check out sourceforge.net/projects/xml-extractor/ (you could
either find it in lib/cmdopts.py, or just download xlf_to_wfx.tar.gz).
Furthermore, on the usability front, I believe that any output file argument
should be supplied via a '--output' or '-o' option. In fact, the only
non-option file argument(s) should be input files (but taking an output
file, this way (as 'tar' does), is particularly pernicious, since it could
result in a file getting clobbered, if the user isn't careful or
knowledgeable).
> > * Listing URIs of all external entities referenced (defined
> > would be okay, too, but only as an option)
>
>Doesn't do this yet, but if you post a feature request on the 4Suite SF
>feature request tracker, I can try to add it soon.
If you want to list only the entities that are actually referenced (which I
think is the most reasonable behavior), then ENTITY and ENTITIES-type
attributes make this slightly more complicated (though it shouldn't be much
trouble, if you have a parsed representation of the DTD, lying around). For
output, the primary behavior should be to resolve the entities to their
ultimate SYSTEM IDs, however it might be a nice feature to have the option
of only listing the PUBLIC IDs or whichever is listed in the ENTITY
definition.
>You're right that these features are very handy, which is why I added 4xml
>:-)
Wow - this would be VERY cool! Thanks for the reply & cooperation, and I'd
be glad to do whatever testing or make any other contributions I can!!
I'm out of time, just now, but I'll checkout the CVS 4xml ASAP!
Matt Gruenke
_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail.
http://www.hotmail.com