Re: [lxml-dev] Problem writing special HTML characters &#...
Jens, thanks for the hints, but I still do not understand how to solve the problem I have. Just a couple of steps to better show it: RSSroot = etree.Element('rss') etree.SubElement(RSSroot, 'title').text = '& # 200;' # space between & # added here just to make sure the actual chars are shown print etree.tostring(RSSroot) and I get <rss><title>È</title></rss> so the '&' turns out to be sanitized, while I wanted the special charcater È to go along ... Roberto On Fri, Feb 26, 2010 at 10:39 AM, Jens Quade <jq@qdevelop.de> wrote:
On 26.02.2010, at 09:08, roby.brunelli@gmail.com wrote:
I'm trying to write an RSS file (extracting information from an html page) using
etree.ElementTree(..).write(..)
When I create the description part of a news I insert text with special characters such as:
È
and when I print (or write to file) the corresponding element, I get
È
which I do not want (I want the original special char): is there a way to prevent this kind of mapping??
from lxml import etree
x = etree.XML('<test>ü</test>') etree.ElementTree(x).write(sys.stdout) <test>ü</test>
etree.ElementTree(x).write(sys.stdout, encoding='utf-8') <test>ü</test>
also:
print etree.tostring(x,encoding='utf-8') <test>ü</test>
default encoding is ascii.
Hi, please don't top-post. Roberto Brunelli, 26.02.2010 12:20:
On Fri, Feb 26, 2010 at 10:39 AM, Jens Quade wrote:
On 26.02.2010, at 09:08, roby.brunelli@gmail.com wrote:
I'm trying to write an RSS file (extracting information from an html page) using
etree.ElementTree(..).write(..)
When I create the description part of a news I insert text with special characters such as:
È
and when I print (or write to file) the corresponding element, I get
È
which I do not want (I want the original special char): is there a way to prevent this kind of mapping??
from lxml import etree x = etree.XML('<test>ü</test>') etree.ElementTree(x).write(sys.stdout) <test>ü</test>
etree.ElementTree(x).write(sys.stdout, encoding='utf-8') <test>ü</test>
also:
print etree.tostring(x,encoding='utf-8') <test>ü</test>
default encoding is ascii.
thanks for the hints, but I still do not understand how to solve the problem I have. Just a couple of steps to better show it:
RSSroot = etree.Element('rss') etree.SubElement(RSSroot, 'title').text = '& # 200;' # space between & # added here just to make sure the actual chars are shown print etree.tostring(RSSroot)
and I get
<rss><title>È</title></rss>
so the '&' turns out to be sanitized, while I wanted the special charcater È to go along ...
So, what is it that you want in the serialised XML: 'È' or 'È' ? Jens showed you how to get to both. Stefan
participants (2)
-
Roberto Brunelli
-
Stefan Behnel