BeautifulSoup -converting unicode to numerical representaion

S.Selvam Siva s.selvamsiva at gmail.com
Mon Feb 9 14:48:48 CET 2009


Hi all,

I need to parse feeds and post the data to SOLR.I want the special
characters(Unicode char) to be posted as numerical representation,

For eg,
*'* --> ’ (for which HTML equivalent is ’)
I used BeautifulSoup,which seems to be allowing conversion from "&#xxxx;"(
numeric values )to unicode characters as follow,

*hdes=str(BeautifulStoneSoup(strdesc,
convertEntities=BeautifulStoneSoup.HTML_ENTITIES))
xdesc=str(BeautifulStoneSoup(hdes,
convertEntities=BeautifulStoneSoup.XML_ENTITIES))*

But i want *numerical representation of unicode characters.*
I also want to convert html representation like ’ to its numeric
equivalent ’

Thanks in advance.

*Note:*
The reason for the above requirement is i need a standard way to post to
SOLR to avoid errors.
-- 
Yours,
S.Selvam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090209/19c8e2c8/attachment.html>


More information about the Python-list mailing list