[XML-SIG] Proposal: command-line interface to parser

Matt G. matt_g_@hotmail.com
Sun, 07 Jul 2002 03:58:12 +0000


A quick search (i.e. 'find PyXML-0.7.1 -perm +111') doesn't turn up any 
general-purpose applications of the sort I'm looking for - sorry if it's 
there and I missed it (but why not 'chmod +x' it?).

Anyhow, I think it'd be immensely useful to include a command-line tool that 
performs at least the following functions:

  * XML validation - returns a nonzero error code and
    pretty/useful error message if validation fails

  * XML document "flattening" (i.e. writes out copy of parsed
    document, inlining external entities, potentially applying
    DTD attribute defaults, and potentially also validating).
    This would be even more useful, if it supported XInclude.

  * Listing URIs of all external entities referenced (defined
    would be okay, too, but only as an option)


My case for validation is that it's a useful thing to have, if you're 
feeding XML into something that doesn't use a validating parser or one with 
obscure error messages.  Also, having a command-line interface to the parser 
(both validating and non-validating) would be useful for those who want to 
do testing and benchmarking of XML parsers.

XML flattening makes sense, if you consider that SGML and XML Catalog 
support is the exception, among XML-based applications.  I'm not quite sure 
how one would deal with external unparsed entities (perhaps resolve their 
URI to a local system path?), but it should otherwise be pretty 
straightforward.

Listing external entities is important for using Makefiles (or Ant?) to 
process XML-based sourcefiles only when necessary.  Perhaps you're familiar 
with GCC's '-M' option, which is specifically geared towards auto-dependency 
generation?

Finally, it would have value as another example for people to look at.


Anyway, I was just curious as to whether anyone else saw any value in a 
command-line interface to an XML parser, with these functions.  I'd be 
willing to write it, though it doesn't seem like much work.  I haven't yet 
poked around at PyXML, much.  Would it make more sense to build such a thing 
on minidom?  Would that have the resolver hooks, for listing external 
entities that get referenced?


Thanks for considering this idea & providing any feedback.


Matt Gruenke


_________________________________________________________________
Chat with friends online, try MSN Messenger: http://messenger.msn.com