
Hi, Konstantin Ryabitsev wrote:
I'm having trouble with the following case. One of my automatic import scripts takes data from one source and submits it to another as an XML feed. Recently, it started failing because one of the entries contains a null. The testcase is such:
from lxml.etree import Element sourcestr = 'Contains a null: \x00' unistr = unicode(sourcestr, 'utf-8') elt = Element('foo').text = unistr
Running it will cause the following error:
Traceback (most recent call last): File "foo.py", line 6, in <module> elt = Element('foo').text = unistr File "etree.pyx", line 741, in etree._Element.text.__set__ File "apihelpers.pxi", line 344, in etree._setNodeText File "apihelpers.pxi", line 648, in etree._utf8 AssertionError: All strings must be XML compatible, either Unicode or ASCII
Can someone suggest the best way to deal with this?
My first question is: why do you need a '\x00' here? If you want to pass binary data in XML, the best way is to use a safe encoding such as uuencode or whatever. That should be part of your XML language spec/schema/... Stefan