On 22.11.2017, 13:14 PM, Charlie Clark wrote:
Am .11.2017, 12:02 Uhr, schrieb Frank Millman <frank@chagford.com>:
lxml.etree.DocumentInvalid: Element '{http://www.omg.org/spec/BPMN/20100524/MODEL}dataOutput', attribute 'id': 'user_row_id' is not a valid value of the atomic type 'xs:ID'., line 38 Is this a bug? Should I always use the second method
It would help to provide an example of the schema and the XML you're trying to validate and the code you're using to do this. The behaviour of the various validation methods is subtly different, with assertValid() the only one that tells you why something is not valid. Validating when parsing might write the errors to the error log.
Internally I think lxml hands the work to libXML2 so that if there is a bug, it's most likely to be there. But we really need more information about what exactly you're doing.
I tried to reduce this to a simple example, but in that case the parsing validator correctly picked up the duplicate id, so I see what you mean about ‘subtly different’. I don’t want to waste anyone’s time on this, as this is not critical to me. However, for interest, here is a bit of info. The schema is large and complex (to me, anyway). There are a number of xsd files. The links can be found here - http://www.omg.org/spec/BPMN/2.0/About-BPMN/ The example that I used for testing is as follows - <xsd:element name="scriptTask" type="tScriptTask" substitutionGroup="flowElement"/> <xsd:complexType name="tScriptTask"> <xsd:complexContent> <xsd:extension base="tTask"> <xsd:sequence> <xsd:element ref="script" minOccurs="0" maxOccurs="1"/> </xsd:sequence> <xsd:attribute name="scriptFormat" type="xsd:string"/> </xsd:extension> </xsd:complexContent> </xsd:complexType> As you can see, it comes from flowElement - <xsd:element name="flowElement" type="tFlowElement"/> <xsd:complexType name="tFlowElement" abstract="true"> <xsd:complexContent> <xsd:extension base="tBaseElement"> <xsd:sequence> <xsd:element ref="auditing" minOccurs="0" maxOccurs="1"/> <xsd:element ref="monitoring" minOccurs="0" maxOccurs="1"/> <xsd:element name="categoryValueRef" type="xsd:QName" minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> <xsd:attribute name="name" type="xsd:string"/> </xsd:extension> </xsd:complexContent> </xsd:complexType> This one comes from baseElement - <xsd:element name="baseElement" type="tBaseElement"/> <xsd:complexType name="tBaseElement" abstract="true"> <xsd:sequence> <xsd:element ref="documentation" minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="extensionElements" minOccurs="0" maxOccurs="1" /> </xsd:sequence> <xsd:attribute name="id" type="xsd:ID" use="optional"/> <xsd:anyAttribute namespace="##other" processContents="lax"/> </xsd:complexType> baseElement defines the “id” attribute that is causing the problem. In my xml file, I have this - <semantic:scriptTask id="task_AfterLogin" name="AfterLogin task"> [...] </semantic:scriptTask> <semantic:scriptTask id="task_CancelLogin" name="CancelLogin task"> [...] </semantic:scriptTask> To test, I simply changed the second id to be the same as the first one. This is the code that I used - 1. Validate while parsing parser = etree.XMLParser( schema=etree.XMLSchema(file='bpmn20/BPMN20.xsd'), attribute_defaults=True, remove_comments=True, remove_blank_text=True) xml = open('login_proc.xml').read() elem = etree.fromstring(xml, parser=parser) This did not pick up the error. 2. Parse, then validate parser = etree.XMLParser( attribute_defaults=True, remove_comments=True, remove_blank_text=True) schema=etree.XMLSchema(file='bpmn20/BPMN20.xsd') xml = open('login_proc.xml').read() elem = etree.fromstring(xml, parser=parser) schema.assertValid(elem) This did pick it the error. Comments welcome, but as I said, this is not critical to me. Thanks Frank