data:image/s3,"s3://crabby-images/8bbe6/8bbe681f08550d13b35a459376ee85cf203c1262" alt=""
Hi, I have created the following example files to test this 2-base.xsd <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE schema [ <!ENTITY lowalpha "a-z"> <!ENTITY hialpha "A-Z"> <!ENTITY alpha "&lowalpha;&hialpha;"> <!ENTITY digit "0-9"> <!ENTITY uword "([&digit;]{1,4}|[1-5][&digit;]{4}|6[0-4][&digit;]{3}|65[0-4][&digit;]{2}|655[0-2][&digit;]|6553[0-5])"> <!ENTITY Port ":&uword;"> ]> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="urn:paul:types2" targetNamespace="urn:paul:types2" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:simpleType name="PortType"> <xs:restriction base="xs:string"> <xs:pattern value="&Port;"/> </xs:restriction> </xs:simpleType> </xs:schema> 2-main.xsd <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="urn:paul:main2" xmlns:t2="urn:paul:types2" targetNamespace="urn:paul:main2" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:import namespace="urn:paul:types2" schemaLocation="2-base.xsd"/> <xs:element name="Port" type="t2:PortType"/> </xs:schema> 2.xml <?xml version="1.0" encoding="UTF-8"?> <Port xmlns="urn:paul:main2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:paul:main2 2-main.xsd">:10100</Port> 2.xml validates OK in XMLspy and if the value is changed to :101001 for example, a match error is given so we know the pattern is progressing through the xs:import etc 2.xml does not validate when used against lxml from lxml import etree with open('2-main.xsd', 'r') as schema_file: my_schema=etree.XMLSchema(etree.parse(schema_file)) filename='2.xml' with open(filename) as file: my_xml=etree.parse(file) my_schema.assertValid(my_xml) This gives the following error Traceback (most recent call last): File "G:\lxml-test\test1b.py", line 11, in <module> my_schema.assertValid(my_xml) File "src\lxml\etree.pyx", line 3623, in lxml.etree._Validator.assertValid lxml.etree.DocumentInvalid: Element '{urn:paul:main1}Port': [facet 'pattern'] The value ':10100' is not accepted by the pattern '[]+'., line 2 Is this something that 'should' be supported by lxml? If so, am I missing something in the validation? Should work and I can't reproduce your problem i.e. your given sample files work for me just fine (lxml 4.6.2, libxml2 2, 9, 7, libxslt 1, 1, 29). But I think you're actually using different data according to this output: Element '{urn:paul:main1}Port' Note the differing namespace as opposed to the 2.xml sample. Maybe some typo / difference in your runs with XMLSpy vs lxml is causing the headaches? Sidenote (but you probably know this): No need to open() files yourself, lxml.etree does it just fine: # validate.py from lxml import etree schema_tree = etree.parse('2-main.xsd') my_schema = etree.XMLSchema(schema_tree) my_xml = etree.parse('2.xml') result = my_schema.assertValid(my_xml) Best regards, Holger [cid:image001_e3261df1-efb3-4c64-a4de-9b63c5e52cfb.png]<https://www.lbbw.de/> [cid:image002_15f56f09-b8fe-4ddc-a247-e3cf3d25c71f.png]<https://twitter.com/lbbw> [cid:image003_e01dca75-944e-4ea3-8463-c9c061e5b36b.png] <https://www.linkedin.com/company/lbbw> [cid:image004_5642e2b4-e190-4f0c-a274-7ceb59fe913c.png] <https://www.xing.com/company/lbbw> [cid:image005_53d53024-7202-4d60-8d95-dc0f361239ca.png] <https://www.facebook.com/LBBW.Stuttgart/> [cid:image006_9c0b6462-8935-4b9c-9fe3-0caa90d8623b.png] <https://www.youtube.com/user/LBBWDirekt> [cid:image007_9a7ebd18-8e4e-4c3e-931a-ecc4bf575ace.png] <https://www.instagram.com/lbbw_karriere/> Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart HRA 4356, HRA 104 440 Amtsgericht Mannheim HRA 40687 Amtsgericht Mainz Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen Daten. Informationen finden Sie unter https://www.lbbw.de/datenschutz.
data:image/s3,"s3://crabby-images/a6baf/a6baf98077876dff160990877529500371635cfa" alt=""
Thank you Holger for your reply. I have further refined the schemas for validation - in an attempt to better organize reusable files. Now I have this... entities.dtd <!ENTITY digit "0-9"> <!ENTITY uword "([&digit;]{1,4}|[1-5][&digit;]{4}|6[0-4][&digit;]{3}|65[0-4][&digit;]{2}|655[0-2][&digit;]|6553[0-5])"> <!ENTITY Port ":&uword;"> 1-base.xsd <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE schema SYSTEM "entities.dtd"> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="urn:paul:types1" targetNamespace="urn:paul:types1" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:simpleType name="PortType"> <xs:restriction base="xs:string"> <xs:pattern value="&Port;"/> </xs:restriction> </xs:simpleType> </xs:schema> 1-main.xsd <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="urn:paul:main1" xmlns:t="urn:paul:types1" targetNamespace="urn:paul:main1" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:import namespace="urn:paul:types1" schemaLocation="1-base.xsd"/> <xs:element name="Port" type="t:PortType"/> </xs:schema> 1.xml <?xml version="1.0" encoding="UTF-8"?> <Port xmlns="urn:paul:main1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:paul:main1 1-main.xsd">:10100</Port> With XMLspy, 1.xml validates OK and proper range checks on <Port> occur, but the following lxml sctipt fails import lxml from lxml import etree with open('1-main.xsd', 'r') as schema_file: mainschema=etree.parse(schema_file) my_schema=etree.XMLSchema(mainschema) with open('1.xml') as file: my_xml=etree.parse(file) my_schema.assertValid(my_xml) the error message given is Traceback (most recent call last): File "G:\lxml-test\test1.py", line 19, in <module> my_schema.assertValid(my_xml) File "src\lxml\etree.pyx", line 3623, in lxml.etree._Validator.assertValid lxml.etree.DocumentInvalid: Element '{urn:paul:main1}Port': [facet 'pattern'] The value ':10100' is not accepted by the pattern ''., line 2 so it appears that the &Port; entity is not being read in "<!DOCTYPE schema SYSTEM "entities.dtd">" Thanks Paul From: Holger.Joukl@LBBW.de <Holger.Joukl@LBBW.de> Sent: 24 March 2021 16:47 To: lxml@lxml.de Subject: [lxml] Re: Using DTD ENTITY with lxml Hi, I have created the following example files to test this 2-base.xsd <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE schema [ <!ENTITY lowalpha "a-z"> <!ENTITY hialpha "A-Z"> <!ENTITY alpha "&lowalpha;&hialpha;"> <!ENTITY digit "0-9"> <!ENTITY uword "([&digit;]{1,4}|[1-5][&digit;]{4}|6[0-4][&digit;]{3}|65[0-4][&digit;]{2}|655[0-2][&digit;]|6553[0-5])"> <!ENTITY Port ":&uword;"> ]> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="urn:paul:types2" targetNamespace="urn:paul:types2" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:simpleType name="PortType"> <xs:restriction base="xs:string"> <xs:pattern value="&Port;"/> </xs:restriction> </xs:simpleType> </xs:schema> 2-main.xsd <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="urn:paul:main2" xmlns:t2="urn:paul:types2" targetNamespace="urn:paul:main2" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:import namespace="urn:paul:types2" schemaLocation="2-base.xsd"/> <xs:element name="Port" type="t2:PortType"/> </xs:schema> 2.xml <?xml version="1.0" encoding="UTF-8"?> <Port xmlns="urn:paul:main2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:paul:main2 2-main.xsd">:10100</Port> 2.xml validates OK in XMLspy and if the value is changed to :101001 for example, a match error is given so we know the pattern is progressing through the xs:import etc 2.xml does not validate when used against lxml from lxml import etree with open('2-main.xsd', 'r') as schema_file: my_schema=etree.XMLSchema(etree.parse(schema_file)) filename='2.xml' with open(filename) as file: my_xml=etree.parse(file) my_schema.assertValid(my_xml) This gives the following error Traceback (most recent call last): File "G:\lxml-test\test1b.py", line 11, in <module> my_schema.assertValid(my_xml) File "src\lxml\etree.pyx", line 3623, in lxml.etree._Validator.assertValid lxml.etree.DocumentInvalid: Element '{urn:paul:main1}Port': [facet 'pattern'] The value ':10100' is not accepted by the pattern '[]+'., line 2 Is this something that 'should' be supported by lxml? If so, am I missing something in the validation? Should work and I can't reproduce your problem i.e. your given sample files work for me just fine (lxml 4.6.2, libxml2 2, 9, 7, libxslt 1, 1, 29). But I think you're actually using different data according to this output: Element '{urn:paul:main1}Port' Note the differing namespace as opposed to the 2.xml sample. Maybe some typo / difference in your runs with XMLSpy vs lxml is causing the headaches? Sidenote (but you probably know this): No need to open() files yourself, lxml.etree does it just fine: # validate.py from lxml import etree schema_tree = etree.parse('2-main.xsd') my_schema = etree.XMLSchema(schema_tree) my_xml = etree.parse('2.xml') result = my_schema.assertValid(my_xml) Best regards, Holger [cid:image001.png@01D7221E.FBEFA9B0]<https://www.lbbw.de/> [cid:image002.png@01D7221E.FBEFA9B0]<https://twitter.com/lbbw> [cid:image003.png@01D7221E.FBEFA9B0] <https://www.linkedin.com/company/lbbw> [cid:image004.png@01D7221E.FBEFA9B0] <https://www.xing.com/company/lbbw> [cid:image005.png@01D7221E.FBEFA9B0] <https://www.facebook.com/LBBW.Stuttgart/> [cid:image006.png@01D7221E.FBEFA9B0] <https://www.youtube.com/user/LBBWDirekt> [cid:image007.png@01D7221E.FBEFA9B0] <https://www.instagram.com/lbbw_karriere/> Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart HRA 4356, HRA 104 440 Amtsgericht Mannheim HRA 40687 Amtsgericht Mainz Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen Daten. Informationen finden Sie unter https://www.lbbw.de/datenschutz.
data:image/s3,"s3://crabby-images/a6baf/a6baf98077876dff160990877529500371635cfa" alt=""
Thank you Holger for your reply. I have further refined the schemas for validation - in an attempt to better organize reusable files. Now I have this... entities.dtd <!ENTITY digit "0-9"> <!ENTITY uword "([&digit;]{1,4}|[1-5][&digit;]{4}|6[0-4][&digit;]{3}|65[0-4][&digit;]{2}|655[0-2][&digit;]|6553[0-5])"> <!ENTITY Port ":&uword;"> 1-base.xsd <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE schema SYSTEM "entities.dtd"> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="urn:paul:types1" targetNamespace="urn:paul:types1" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:simpleType name="PortType"> <xs:restriction base="xs:string"> <xs:pattern value="&Port;"/> </xs:restriction> </xs:simpleType> </xs:schema> 1-main.xsd <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="urn:paul:main1" xmlns:t="urn:paul:types1" targetNamespace="urn:paul:main1" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:import namespace="urn:paul:types1" schemaLocation="1-base.xsd"/> <xs:element name="Port" type="t:PortType"/> </xs:schema> 1.xml <?xml version="1.0" encoding="UTF-8"?> <Port xmlns="urn:paul:main1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:paul:main1 1-main.xsd">:10100</Port> With XMLspy, 1.xml validates OK and proper range checks on <Port> occur, but the following lxml sctipt fails import lxml from lxml import etree with open('1-main.xsd', 'r') as schema_file: mainschema=etree.parse(schema_file) my_schema=etree.XMLSchema(mainschema) with open('1.xml') as file: my_xml=etree.parse(file) my_schema.assertValid(my_xml) the error message given is Traceback (most recent call last): File "G:\lxml-test\test1.py", line 19, in <module> my_schema.assertValid(my_xml) File "src\lxml\etree.pyx", line 3623, in lxml.etree._Validator.assertValid lxml.etree.DocumentInvalid: Element '{urn:paul:main1}Port': [facet 'pattern'] The value ':10100' is not accepted by the pattern ''., line 2 so it appears that the &Port; entity is not being read in "<!DOCTYPE schema SYSTEM "entities.dtd">" Thanks Paul From: Holger.Joukl@LBBW.de <Holger.Joukl@LBBW.de> Sent: 24 March 2021 16:47 To: lxml@lxml.de Subject: [lxml] Re: Using DTD ENTITY with lxml Hi, I have created the following example files to test this 2-base.xsd <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE schema [ <!ENTITY lowalpha "a-z"> <!ENTITY hialpha "A-Z"> <!ENTITY alpha "&lowalpha;&hialpha;"> <!ENTITY digit "0-9"> <!ENTITY uword "([&digit;]{1,4}|[1-5][&digit;]{4}|6[0-4][&digit;]{3}|65[0-4][&digit;]{2}|655[0-2][&digit;]|6553[0-5])"> <!ENTITY Port ":&uword;"> ]> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="urn:paul:types2" targetNamespace="urn:paul:types2" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:simpleType name="PortType"> <xs:restriction base="xs:string"> <xs:pattern value="&Port;"/> </xs:restriction> </xs:simpleType> </xs:schema> 2-main.xsd <?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="urn:paul:main2" xmlns:t2="urn:paul:types2" targetNamespace="urn:paul:main2" elementFormDefault="qualified" attributeFormDefault="unqualified"> <xs:import namespace="urn:paul:types2" schemaLocation="2-base.xsd"/> <xs:element name="Port" type="t2:PortType"/> </xs:schema> 2.xml <?xml version="1.0" encoding="UTF-8"?> <Port xmlns="urn:paul:main2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:paul:main2 2-main.xsd">:10100</Port> 2.xml validates OK in XMLspy and if the value is changed to :101001 for example, a match error is given so we know the pattern is progressing through the xs:import etc 2.xml does not validate when used against lxml from lxml import etree with open('2-main.xsd', 'r') as schema_file: my_schema=etree.XMLSchema(etree.parse(schema_file)) filename='2.xml' with open(filename) as file: my_xml=etree.parse(file) my_schema.assertValid(my_xml) This gives the following error Traceback (most recent call last): File "G:\lxml-test\test1b.py", line 11, in <module> my_schema.assertValid(my_xml) File "src\lxml\etree.pyx", line 3623, in lxml.etree._Validator.assertValid lxml.etree.DocumentInvalid: Element '{urn:paul:main1}Port': [facet 'pattern'] The value ':10100' is not accepted by the pattern '[]+'., line 2 Is this something that 'should' be supported by lxml? If so, am I missing something in the validation? Should work and I can't reproduce your problem i.e. your given sample files work for me just fine (lxml 4.6.2, libxml2 2, 9, 7, libxslt 1, 1, 29). But I think you're actually using different data according to this output: Element '{urn:paul:main1}Port' Note the differing namespace as opposed to the 2.xml sample. Maybe some typo / difference in your runs with XMLSpy vs lxml is causing the headaches? Sidenote (but you probably know this): No need to open() files yourself, lxml.etree does it just fine: # validate.py from lxml import etree schema_tree = etree.parse('2-main.xsd') my_schema = etree.XMLSchema(schema_tree) my_xml = etree.parse('2.xml') result = my_schema.assertValid(my_xml) Best regards, Holger [cid:image001.png@01D7221E.FBEFA9B0]<https://www.lbbw.de/> [cid:image002.png@01D7221E.FBEFA9B0]<https://twitter.com/lbbw> [cid:image003.png@01D7221E.FBEFA9B0] <https://www.linkedin.com/company/lbbw> [cid:image004.png@01D7221E.FBEFA9B0] <https://www.xing.com/company/lbbw> [cid:image005.png@01D7221E.FBEFA9B0] <https://www.facebook.com/LBBW.Stuttgart/> [cid:image006.png@01D7221E.FBEFA9B0] <https://www.youtube.com/user/LBBWDirekt> [cid:image007.png@01D7221E.FBEFA9B0] <https://www.instagram.com/lbbw_karriere/> Landesbank Baden-Wuerttemberg Anstalt des oeffentlichen Rechts Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz HRA 12704 Amtsgericht Stuttgart HRA 4356, HRA 104 440 Amtsgericht Mannheim HRA 40687 Amtsgericht Mainz Die LBBW verarbeitet gemaess Erfordernissen der DSGVO Ihre personenbezogenen Daten. Informationen finden Sie unter https://www.lbbw.de/datenschutz.
participants (2)
-
Holger.Joukl@LBBW.de
-
Paul Higgs