unicode html
Stefan Behnel
stefan.behnel-n05pAM at web.de
Tue Jul 18 02:43:58 EDT 2006
lorenzo.viscanti at gmail.com wrote:
> Hi, I've found lots of material on the net about unicode html
> conversions, but still i'm having many problems converting unicode
> characters to html entities. Is there any available function to solve
> this issue?
> As an example I would like to do this kind of conversion:
> \uc3B4 => รด
> for all available html entities.
I don't know how you generate your HTML, but ElementTree and lxml both have
good HTML parsers, so that you can let them write out the result with an
"US-ASCII" encoding and they will generate numeric entities for everything
that's not ASCII.
>>> from lxml import etree
>>> root = etree.HTML(my_html_data)
>>> html_7_bit = etree.tostring(root, "us-ascii")
Stefan
More information about the Python-list
mailing list