[Tutor] Removing GB pound symbols from Beautiful soup output

Andy cheesman at titan.physx.u-szeged.hu
Fri Jul 16 16:16:58 CEST 2010


Dear Nice people

I've been using beautiful soup to filter the BBC's rss feed. However, 
recently the bbc have changed the feed and it is causing me problems 
with the pound(money) symbol. The initial error was "UnicodeEncodeError: 
'ascii' codec can't encode character u'\xa3'" which means that the 
default encoding can't process this (unicode) character. I was having 
simular problems with HTML characters appearing but I used a simple 
regex system to remove/substitute them to something suitable.
I tried applying the same approach and make a generic regex patten 
(re.compile(u"""\u\[A-Fa-f0-9\]\{4\}""") but this fails because it 
doesn't follow the standard patten for ascii. I'm not sure that I 100% 
understand the unicode system but is there a simple way to 
remove/subsitute these non ascii strings?

Thanks for any help!

Andy


More information about the Tutor mailing list