ElementTree.fromstring(unicode_html)
Fredrik Lundh
fredrik at pythonware.com
Sun Jan 27 13:35:53 EST 2008
globophobe wrote:
> In [1]: unicode_html = u'\u3055\u3080\u3044\uff0f\r\n\u3064\u3081\u305f
> \u3044\r\n'
>
> I need to turn this into an elementtree, but some of the data is
> japanese whereas the rest is html. This string contains a <br />.
where? <br /> is an element, not a character. "\r" and "\n" are
characters, not elements.
If you want to build a tree where "\r\n" is replaced with a <br />
element, you can encode the string as UTF-8, use the replace method to
insert the element, and then call fromstring.
Alternatively, you can build the tree yourself:
import xml.etree.ElementTree as ET
unicode_html =
u'\u3055\u3080\u3044\uff0f\r\n\u3064\u3081\u305f\u3044\r\n'
parts = unicode_html.splitlines()
elem = ET.Element("data")
elem.text = parts[0]
for part in parts[1:]:
ET.SubElement(elem, "br").tail = part
print ET.tostring(elem)
</F>
More information about the Python-list
mailing list