[lxml-dev] Problem with lxml-1.1.2 and binary text nodes

I seem to recall that Lxml used to raise an exception if binary data was put into a text node of an xml element. Was this change intentional? Is there any way to use lxml to check for document well-formedness before sending out xml? thanks... -nld

Hi,
Narayan Desai wrote:
I seem to recall that Lxml used to raise an exception if binary data was put into a text node of an xml element. Was this change intentional? Is there any way to use lxml to check for document well-formedness before sending out xml?
With 'binary' you mean 'containing 0-bytes', right?
It looks like we have a general problem with passing such strings to libxml2:
from lxml.etree import * r = XML("<test/>") r.text = "a\0b" print repr(tostring(r))
a
I guess it would be better to just raise an exception in this case, however, that would require us to walk through all characters of strings that we get passed. Not sure it's worth it. Any comments?
Stefan

Stefan Behnel wrote:
Narayan Desai wrote:
I seem to recall that Lxml used to raise an exception if binary data was put into a text node of an xml element. Was this change intentional? Is there any way to use lxml to check for document well-formedness before sending out xml?
With 'binary' you mean 'containing 0-bytes', right?
It looks like we have a general problem with passing such strings to libxml2:
from lxml.etree import * r = XML("<test/>") r.text = "a\0b" tostring(r)
'a'
This is now caught by lxml. You will get an AssertionError if you pass strings containing "\0" bytes to any of the API functions. You also get an XMLSyntaxError if you pass such a string to the parser (which was already the case before).
I think that's reasonable behaviour.
Have fun, Stefan
participants (2)
-
Narayan Desai
-
Stefan Behnel