Hi, Rob Sanderson wrote:
The null character makes the XML non-well-formed anyway.
The legal character ranges for XML (as per the spec, section 2.2):
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Definitely no \x00!
that's true. While you could get away on the XML /generator/ side with adding an Entity (and lxml 2.0 will let you do that), this will just let you write out broken XML that the recipient will not be able to parse:
from lxml import etree as et el = et.Element("test") el.text = "mind the " el.append(et.Entity("#0")) xml = et.tostring(el) '<test>mind the </test>'
et.fromstring(xml) Traceback (most recent call last): lxml.etree.XMLSyntaxError: xmlParseCharRef: invalid xmlChar value 0, line 1,
column 20
Maybe we should fix the Entity() factory here to prevent such misuse... Stefan