[Tutor] extract uri from beautiful soup string

Sander Sweers sander.sweers at gmail.com
Mon Oct 15 03:02:15 CEST 2012


Sander Sweers schreef op ma 15-10-2012 om 02:35 [+0200]:
> > On Mon, Oct 15, 2012 at 12:12 AM, Sander Sweers <sander.sweers at gmail.com> wrote:
> > > Norman Khine schreef op zo 14-10-2012 om 23:10 [+0100]:
> > Norman Khine schreef op ma 15-10-2012 om 00:17 [+0100]:
> > i tried this: http://pastie.org/5059153

Btw, if I understand what you are trying to do then you can do this much
more simple. I noticed that all the a tags with onclick have an href
attribute of '#'. To get all of these do something like:

soup.findAll('a', {'href':'#'})

Then use the attrmap eg attrMap['onclick'].split('\'')[1].

Put together that may look like the below.

for i in soup.findAll('a', {'href':'#'}):
    if 'toolbar=0' in i.attrMap['onclick']:
        print i.attrMap['onclick'].split('\'')[1]

Greets
Sander



More information about the Tutor mailing list