[XML-SIG] validating with XML schema (long)

Hawkeye Parker hawkeye.parker at autodesk.com
Tue Oct 14 21:38:10 EDT 2003


hi all

i need to validate the structure/content of some xml, and i'm parsing etc. with python.  i've been learning a bit about XML Schema and i'd like to confirm some basic assumptions:

-validation with XML Schema (or any other validation language) doesn't "just happen".  i.e., just because you specify an .xsd file in your xml, you still need to explicitly "call" it to validate the xml.  it must be, correct?

assuming i'm right so far:  in terms of validation, it seems that DTD is unwieldy and that XML Schema (.xsd) is a much better choice, except that there's little support for it in general, and specifically in python.  in fact, there doesn't seem to be a whole lot of xml validation support at all . . . .  this makes me think that:

-there are other (more sensible?) ways to validate the xml, like parsing into DOM and then using python to validate according to your desires.  maybe messy but obvious.
-xml is new, validation of xml is newer, validation with XML Schema is newer yet.

in anycase, i've gotten XSV and run it against a few of my own examples.  again, i'm confused:  XSV seems to validate the XML Schema itself (schemaErrors) as much as the XML (instanceErrors).  i guess this is good.  moreover, i was expecting to write something like this:

XSV.validate('foo.xml', foo.xsd')

which would raise an exception if anything went wrong with the validation of the XML (.xml) file according to the XML Schema file (.xsd).  instead, i get an (opaque) xml object that i will have to parse futher, eventually to raise my own custom exceptions.  not meaning to complain at all, this makes me think i should just do everything in python:  then, at least i know exactly what i'm doing.  why am i wrong?

lastly, here's an example of some simple xml and an empty schema:

<?xml version="1.0" encoding="UTF-16" ?>
<PPSiteBuilder xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:noNamespaceSchemaLocation='PPSiteBuilderSchema.xsd'>
	<site></site>
	<reallyReallyStupidWrongTag></reallyReallyStupidWrongTag>
</PPSiteBuilder>

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
</xsd:schema>

here's the xsv output:

<?xml version='1.0'?>
<xsv xmlns="http://www.w3.org/2000/05/xsv" docElt="{None}PPSiteBuilder"
     instanceAssessed="true" instanceErrors="0" schemaErrors="0"
  schemaLocs="None -> PPSiteBuilderSchema.xsd; None -> PPSiteBuilderSchema.xsd"
     target="file:///C:/sandbox/site_builder/siteBuilder.xml" validation="lax"
     version="XSV 2.5-2 of 2003/07/09 13:08:04">
  <schemaDocAttempt
    URI="file:///C:/sandbox/site_builder/PPSiteBuilderSchema.xsd"
                    outcome="success" source="schemaLoc"/>
  <schemaDocAttempt
    URI="file:///C:/sandbox/site_builder/PPSiteBuilderSchema.xsd"
                    outcome="redundant" source="schemaLoc"/>
</xsv>


XSV does not complain about this example, though none of the elements (<PPSiteBuilder>, <site>, etc.) are specified in the Schema.  i expect i'm missing something basic about xml, validation, and XML Schema, but this is just the sort of *very bad* xml that i want to be able to catch during validation.

any help, thoughts, nudges would be wonderful.

thanks,
hawkeye



More information about the XML-SIG mailing list