[lxml-dev] Bug with whitespace in namespaces
data:image/s3,"s3://crabby-images/ee8c4/ee8c49a7972053fa849bdc7c06bf933514d3a829" alt=""
Hoi, it is possible to create invalid XML with lxml:
import lxml.etree import lxml.objectify xml = lxml.objectify.XML('<a/>') xml.set('{a b}c', 'foo') # This should fail! lxml.etree.tostring(xml) '<a xmlns:ns0="a b" ns0:c="foo"/>' lxml.objectify.fromstring(lxml.etree.tostring(xml)) Traceback (most recent call last): ... File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64741) File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64084) lxml.etree.XMLSyntaxError: xmlns:ns0: 'a b' is not a valid URI, line 1,
column 13 >>>
Regards, -- Christian Zagrodnick · cz@gocept.com gocept gmbh & co. kg · forsterstraße 29 · 06112 halle (saale) · germany http://gocept.com · tel +49 345 1229889 4 · fax +49 345 1229889 1 Zope and Plone consulting and development
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Christian Zagrodnick wrote:
it is possible to create invalid XML with lxml:
import lxml.etree import lxml.objectify xml = lxml.objectify.XML('<a/>') xml.set('{a b}c', 'foo') # This should fail! lxml.etree.tostring(xml) '<a xmlns:ns0="a b" ns0:c="foo"/>' lxml.objectify.fromstring(lxml.etree.tostring(xml)) Traceback (most recent call last): ... File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64741) File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64084) lxml.etree.XMLSyntaxError: xmlns:ns0: 'a b' is not a valid URI, line 1,
column 13
Well, URI checking is actually a new feature in libxml2 2.7 (IIRC), that's why it wasn't used before. Newer libxml2 versions are strict about RFC 3986 syntax, so I agree that it would make sense to also check namespace URIs on the way in. This should go into lxml 2.3. Stefan
data:image/s3,"s3://crabby-images/4cf20/4cf20edf9c3655e7f5c4e7d874c5fdf3b39d715f" alt=""
Stefan Behnel wrote:
Christian Zagrodnick wrote:
it is possible to create invalid XML with lxml:
import lxml.etree import lxml.objectify xml = lxml.objectify.XML('<a/>') xml.set('{a b}c', 'foo') # This should fail! lxml.etree.tostring(xml) '<a xmlns:ns0="a b" ns0:c="foo"/>' lxml.objectify.fromstring(lxml.etree.tostring(xml)) Traceback (most recent call last): ... File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64741) File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64084) lxml.etree.XMLSyntaxError: xmlns:ns0: 'a b' is not a valid URI, line 1,
column 13
Well, URI checking is actually a new feature in libxml2 2.7 (IIRC), that's why it wasn't used before. Newer libxml2 versions are strict about RFC 3986 syntax, so I agree that it would make sense to also check namespace URIs on the way in.
This should go into lxml 2.3.
Fixed on the trunk. Stefan
participants (2)
-
Christian Zagrodnick
-
Stefan Behnel