Clean "Durty" strings

Marc 'BlackJack' Rintsch bj_666 at
Mon Apr 2 19:47:43 CEST 2007

In <1175530649.060784.147900 at>, irstas wrote:

> I'd like to see how this transformation can be done with
> BeautifulSoup. Well, the last two regexps can be replaced with this:
> unicode(BeautifulStoneSoup(s,convertEntities=BeautifulStoneSoup.HTML_ENTITIES).contents[0])

Completely without regular expressions:

def main():
    soup = BeautifulSoup(source, convertEntities=BeautifulSoup.HTML_ENTITIES)
    print ' '.join(''.join(soup(text=True)).split())

	Marc 'BlackJack' Rintsch

More information about the Python-list mailing list