[Tutor] BeautifulSoup - deleting tags
jonasmg at softhome.net
jonasmg at softhome.net
Tue Mar 28 13:35:38 CEST 2006
Kent Johnson writes:
> jonasmg at softhome.net wrote:
>> Is possible deleting all tags from a text and how?
>>
>> i.e.:
>>
>> s='<td><a href="..." title="...">foo bar</a>;<br />
>> <a href="..." title="...">foo2</a> <a href="..."
>> title="...">bar2</a></td>'
>>
>> so, I would get only: foo bar, foo2, bar2
>
> How about this?
>
> In [1]: import BeautifulSoup
>
> In [2]: s=BeautifulSoup.BeautifulSoup('''<td><a href="..." title="...">foo
> bar</a>;<br />
> ...: <a href="..." title="...">foo2</a> <a href="..."
> title="...">bar2</a></td>''')
>
> In [4]: ' '.join(i.string for i in s.fetch() if i.string)
> Out[4]: 'foo bar foo2 bar2'
>
>
> Here are a couple of tag strippers that don't use BS:
> http://www.aminus.org/rbre/python/cleanhtml.py
> http://www.oluyede.org/blog/2006/02/13/html-stripper/
>
> Kent
>
Another way (valid only for this case):
: for i in s.fetch('a'): print i.string
More information about the Tutor
mailing list