[XML-SIG] validating with XML schema (long)
Thomas B. Passin
tpassin at comcast.net
Tue Oct 14 23:06:46 EDT 2003
Hawkeye Parker wrote:
> i need to validate the structure/content of some xml, and i'm parsing
etc. with python. i've been learning a bit about XML Schema and i'd
like to confirm some basic assumptions:
>
> -validation with XML Schema (or any other validation language)
doesn't "just happen". i.e., just because you specify an .xsd file in
your xml, you still need to explicitly "call" it to validate the xml.
it must be, correct?
>
You must make validation happen, but not by "calling" a specified xsd file.
> assuming i'm right so far: in terms of validation, it seems that DTD
is unwieldy and that XML Schema (.xsd) is a much better choice,
Huh??? Most people think that xml schema is an unwieldy beast, not the dtd.
>except that there's little support for it in general, and specifically
in python. in fact, there doesn't seem to be a whole lot of xml
validation support at all . . . . this makes me think that:
>
> -there are other (more sensible?) ways to validate the xml, like
parsing into DOM and then using python to validate according to your
desires. maybe messy but obvious.
Not very feasible except for quite restricted kinds of validation, though.
> -xml is new, validation of xml is newer, validation with XML Schema
is newer yet.
>
> in anycase, i've gotten XSV and run it against a few of my own
examples. again, i'm confused: XSV seems to validate the XML Schema
itself (schemaErrors) as much as the XML (instanceErrors). i guess this
is good. moreover, i was expecting to write something like this:
>
> XSV.validate('foo.xml', foo.xsd')
>
> which would raise an exception if anything went wrong with the
validation of the XML (.xml) file according to the XML Schema file
(.xsd). instead, i get an (opaque) xml object that i will have to parse
futher, eventually to raise my own custom exceptions.
xsv comes with an xslt stylesheet to make the results easier to read.
You could start with that (command line operation) until you understand
what xsv is telling you.
>
> lastly, here's an example of some simple xml and an empty schema:
>
> <?xml version="1.0" encoding="UTF-16" ?>
> <PPSiteBuilder xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
xsi:noNamespaceSchemaLocation='PPSiteBuilderSchema.xsd'>
> <site></site>
> <reallyReallyStupidWrongTag></reallyReallyStupidWrongTag>
> </PPSiteBuilder>
>
> <?xml version="1.0"?>
> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
> </xsd:schema>
>
> here's the xsv output:
>
> <?xml version='1.0'?>
> <xsv xmlns="http://www.w3.org/2000/05/xsv" docElt="{None}PPSiteBuilder"
> instanceAssessed="true" instanceErrors="0" schemaErrors="0"
> schemaLocs="None -> PPSiteBuilderSchema.xsd; None ->
PPSiteBuilderSchema.xsd"
> target="file:///C:/sandbox/site_builder/siteBuilder.xml"
validation="lax"
> version="XSV 2.5-2 of 2003/07/09 13:08:04">
> <schemaDocAttempt
> URI="file:///C:/sandbox/site_builder/PPSiteBuilderSchema.xsd"
> outcome="success" source="schemaLoc"/>
> <schemaDocAttempt
> URI="file:///C:/sandbox/site_builder/PPSiteBuilderSchema.xsd"
> outcome="redundant" source="schemaLoc"/>
> </xsv>
>
>
> XSV does not complain about this example,
But it does tell what it did. In this case, xsv could not find any
elements to validate (since the schema is empty), so it went to "lax"
mode - validation='lax'. This means it did not check the elements it
found. XML Schema validation can be either lax or strict - you have to
read up on it. With lax validation, xsv found no errors since all
schema elements were satisfied or at least not failed (since there were
none).
> though none of the elements (<PPSiteBuilder>, <site>, etc.) are
specified in the Schema. i expect i'm missing something basic about
xml, validation, and XML Schema, but this is just the sort of *very bad*
xml that i want to be able to catch during validation.
>
Learn how to enforce strict validation, or just use a DTD, or go to
RELAX NG. If you use a DTD, you have to use a validating parser and
tell it to validate - Python can do this. Search Google, you should
find enough information.
Cheers,
Tom P
More information about the XML-SIG
mailing list