[Tutor] Removing GB pound symbols from Beautiful soup output
Andy
cheesman at titan.physx.u-szeged.hu
Fri Jul 16 16:16:58 CEST 2010
Dear Nice people
I've been using beautiful soup to filter the BBC's rss feed. However,
recently the bbc have changed the feed and it is causing me problems
with the pound(money) symbol. The initial error was "UnicodeEncodeError:
'ascii' codec can't encode character u'\xa3'" which means that the
default encoding can't process this (unicode) character. I was having
simular problems with HTML characters appearing but I used a simple
regex system to remove/substitute them to something suitable.
I tried applying the same approach and make a generic regex patten
(re.compile(u"""\u\[A-Fa-f0-9\]\{4\}""") but this fails because it
doesn't follow the standard patten for ascii. I'm not sure that I 100%
understand the unicode system but is there a simple way to
remove/subsitute these non ascii strings?
Thanks for any help!
Andy
More information about the Tutor
mailing list