Hi all I seems that there are two ways of validating an xml document with XML Schema. 1. Create a XMLParser object and pass it an XMLSchema object as an argument. If you use this parser object with etree.parse() or etree.fromstring(), it validates the document against the schema while parsing. 2. Parse the document first, and then validate it using XMLSchema.validate() or XMLSchema.assertValid(). I have always used the first method – I was unaware of the second one. However, I have just found a particular error that is detected by the second method, but not by the first one. The error is the use of duplicate “id” attributes, if it is specified in the schema as being of type “xs:ID”. I have a document with duplicate ids. Validating with the first method passes. Validating the same document, with the same schema, using the second method, gives the following - lxml.etree.DocumentInvalid: Element '{http://www.omg.org/spec/BPMN/20100524/MODEL}dataOutput', attribute 'id': 'user_row_id' is not a valid value of the atomic type 'xs:ID'., line 38 Is this a bug? Should I always use the second method? I am using lxml version 4.1.1. Thanks Frank Millman P.S. This is a copy of a message I posted on 26/03/2017. I got no reply to that, so I thought I would try again. I was using version 3.6.4 then, but the behaviour is the same with 4.1.1.
Am .11.2017, 12:02 Uhr, schrieb Frank Millman <frank@chagford.com>:
lxml.etree.DocumentInvalid: Element '{http://www.omg.org/spec/BPMN/20100524/MODEL}dataOutput', attribute 'id': 'user_row_id' is not a valid value of the atomic type 'xs:ID'., line 38 Is this a bug? Should I always use the second method
It would help to provide an example of the schema and the XML you're trying to validate and the code you're using to do this. The behaviour of the various validation methods is subtly different, with assertValid() the only one that tells you why something is not valid. Validating when parsing might write the errors to the error log. Internally I think lxml hands the work to libXML2 so that if there is a bug, it's most likely to be there. But we really need more information about what exactly you're doing. Charlie -- Charlie Clark Managing Director Clark Consulting & Research German Office Kronenstr. 27a Düsseldorf D- 40217 Tel: +49-211-600-3657 Mobile: +49-178-782-6226
On 22.11.2017, 13:14 PM, Charlie Clark wrote:
Am .11.2017, 12:02 Uhr, schrieb Frank Millman <frank@chagford.com>:
lxml.etree.DocumentInvalid: Element '{http://www.omg.org/spec/BPMN/20100524/MODEL}dataOutput', attribute 'id': 'user_row_id' is not a valid value of the atomic type 'xs:ID'., line 38 Is this a bug? Should I always use the second method
It would help to provide an example of the schema and the XML you're trying to validate and the code you're using to do this. The behaviour of the various validation methods is subtly different, with assertValid() the only one that tells you why something is not valid. Validating when parsing might write the errors to the error log.
Internally I think lxml hands the work to libXML2 so that if there is a bug, it's most likely to be there. But we really need more information about what exactly you're doing.
I tried to reduce this to a simple example, but in that case the parsing validator correctly picked up the duplicate id, so I see what you mean about ‘subtly different’. I don’t want to waste anyone’s time on this, as this is not critical to me. However, for interest, here is a bit of info. The schema is large and complex (to me, anyway). There are a number of xsd files. The links can be found here - http://www.omg.org/spec/BPMN/2.0/About-BPMN/ The example that I used for testing is as follows - <xsd:element name="scriptTask" type="tScriptTask" substitutionGroup="flowElement"/> <xsd:complexType name="tScriptTask"> <xsd:complexContent> <xsd:extension base="tTask"> <xsd:sequence> <xsd:element ref="script" minOccurs="0" maxOccurs="1"/> </xsd:sequence> <xsd:attribute name="scriptFormat" type="xsd:string"/> </xsd:extension> </xsd:complexContent> </xsd:complexType> As you can see, it comes from flowElement - <xsd:element name="flowElement" type="tFlowElement"/> <xsd:complexType name="tFlowElement" abstract="true"> <xsd:complexContent> <xsd:extension base="tBaseElement"> <xsd:sequence> <xsd:element ref="auditing" minOccurs="0" maxOccurs="1"/> <xsd:element ref="monitoring" minOccurs="0" maxOccurs="1"/> <xsd:element name="categoryValueRef" type="xsd:QName" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> <xsd:attribute name="name" type="xsd:string"/> </xsd:extension> </xsd:complexContent> </xsd:complexType> This one comes from baseElement - <xsd:element name="baseElement" type="tBaseElement"/> <xsd:complexType name="tBaseElement" abstract="true"> <xsd:sequence> <xsd:element ref="documentation" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="extensionElements" minOccurs="0" maxOccurs="1" /> </xsd:sequence> <xsd:attribute name="id" type="xsd:ID" use="optional"/> <xsd:anyAttribute namespace="##other" processContents="lax"/> </xsd:complexType> baseElement defines the “id” attribute that is causing the problem. In my xml file, I have this - <semantic:scriptTask id="task_AfterLogin" name="AfterLogin task"> [...] </semantic:scriptTask> <semantic:scriptTask id="task_CancelLogin" name="CancelLogin task"> [...] </semantic:scriptTask> To test, I simply changed the second id to be the same as the first one. This is the code that I used - 1. Validate while parsing parser = etree.XMLParser( schema=etree.XMLSchema(file='bpmn20/BPMN20.xsd'), attribute_defaults=True, remove_comments=True, remove_blank_text=True) xml = open('login_proc.xml').read() elem = etree.fromstring(xml, parser=parser) This did not pick up the error. 2. Parse, then validate parser = etree.XMLParser( attribute_defaults=True, remove_comments=True, remove_blank_text=True) schema=etree.XMLSchema(file='bpmn20/BPMN20.xsd') xml = open('login_proc.xml').read() elem = etree.fromstring(xml, parser=parser) schema.assertValid(elem) This did pick it the error. Comments welcome, but as I said, this is not critical to me. Thanks Frank
participants (2)
-
Charlie Clark
-
Frank Millman