[Tutor] Html entities, beautiful soup and unicode

andy cheesman at titan.physx.u-szeged.hu
Tue Jan 19 08:49:27 CET 2010


Hi people

I'm using beautiful soup to rip the uk headlines from the uk bbc page.
This works rather well but there is the problem of html entities which
appear in the xml feed.
Is there an elegant/simple way to convert them into the "standard"
output? By this I mean £ going to  ? or do i have to use regexp?
and where does unicode fit into all of this?

Thanks for your help

Andy 


More information about the Tutor mailing list