Jens,
thanks for the hints, but I still do not understand how to solve the
problem I have.
Just a couple of steps to better show it:
RSSroot = etree.Element('rss')
etree.SubElement(RSSroot, 'title').text = '& # 200;' # space between &
# added here just to make sure the actual chars are shown
print etree.tostring(RSSroot)
and I get
<rss><title>&#200;</title></rss>
so the '&' turns out to be sanitized, while I wanted the special
charcater È to go along ...
Roberto
On Fri, Feb 26, 2010 at 10:39 AM, Jens Quade <jq(a)qdevelop.de> wrote:
>
> On 26.02.2010, at 09:08, roby.brunelli(a)gmail.com wrote:
>
>> I'm trying to write an RSS file (extracting information from an html page) using
>>
>> etree.ElementTree(..).write(..)
>>
>> When I create the description part of a news I insert text with special characters such as:
>>
>> È
>>
>> and when I print (or write to file) the corresponding element, I get
>>
>> È
>>
>> which I do not want (I want the original special char): is there a way to prevent this kind of mapping??
>
>>>> from lxml import etree
>
>>>> x = etree.XML('<test>ü</test>')
>>>> etree.ElementTree(x).write(sys.stdout)
> <test>ü</test>
>
>>>> etree.ElementTree(x).write(sys.stdout, encoding='utf-8')
> <test>ü</test>
>
> also:
>
>>>> print etree.tostring(x,encoding='utf-8')
> <test>ü</test>
>
>
> default encoding is ascii.
>
>