XML Schema?
Uche Ogbuji
uche at ogbuji.net
Wed Feb 14 09:00:40 EST 2001
Harry George wrote:
>
> Anyone have a python XML Schema parser/validator? I thought I saw
> comments that it wasn't being done yet as part of xml-sig. Of course,
> we don't actually need an XML Schema validator inpython (java or C++
> renditions would do fine), but there is a social cachet to it, so
> maybe worth the effort.
I'm not personally a fan of XML Schemas, but I think this would be a
very worth-while project. You'd probably get plenty of help as well.
> Assuming it is an open task, here is an approach. Anyone see holes in
> this, besides it being a humongous task?
>
> 1. Get the specs from OASIS-->W3C.
>
> 2. Get test cases (for schemas and for instances) There are a few
> cases at xml-conf, but I think a lot more will be needed. So I'll
> need to generate them, and that suggests a case generator, plus of
> course a test driver. I have the testcase generator and driver
> done.
>
> 3. XML Schema is basically a regular expression problem, with nodes as
> the "characters".
Hmm. I wouldn't go this far. The most basic parts of the content model
are so, but the entire data-type system and parts of the content model
need a different approach than regular grammar.
> So we can use classical lexer algorithms:
> regexpr --> NFA --> DFA. The hassles may be at the leaf nodes,
> where XML Schema has lots of special cases. I don't knbow if there
> are non-re constraints in the specs, but if so I'd apply them after
> the initial pass.
Interesting approach.
> 4. Given that state machine, run schemas through the parser until it can
> build machines from valid schemas and detect invalid ones.
>
> 5. Given a sound state machine, run instance test cases through the
> package until it is passing valid instances and detecting invalid
> ones.
>
> 6. This would probably be an iterative enhancement exercise, once the
> state machine engine was in place.
>
> I have a lex-workalike I wrote in Modula-2, which I'll use as the
> start point. Probably could use a SAX input approach ("next node"
> instead of "next char"), maybe with 1 lookahead.
Just to note: LT-XML supposedly has a Python interface and an XSchemas
validator. I still think your effort would be worth-while, especially
given your fresh approach.
http://www.ltg.ed.ac.uk/software/xml/
--
Uche Ogbuji
Personal: uche at ogbuji.net http://uche.ogbuji.net
Work: uche.ogbuji at fourthought.com http://Fourthought.com
More information about the Python-list
mailing list