[Tutor] Html entities, beautiful soup and unicode
cheesman at titan.physx.u-szeged.hu
Tue Jan 19 08:49:27 CET 2010
I'm using beautiful soup to rip the uk headlines from the uk bbc page.
This works rather well but there is the problem of html entities which
appear in the xml feed.
Is there an elegant/simple way to convert them into the "standard"
output? By this I mean £ going to Â ? or do i have to use regexp?
and where does unicode fit into all of this?
Thanks for your help
More information about the Tutor