[Tutor] extract uri from beautiful soup string
Sander Sweers
sander.sweers at gmail.com
Mon Oct 15 03:02:15 CEST 2012
Sander Sweers schreef op ma 15-10-2012 om 02:35 [+0200]:
> > On Mon, Oct 15, 2012 at 12:12 AM, Sander Sweers <sander.sweers at gmail.com> wrote:
> > > Norman Khine schreef op zo 14-10-2012 om 23:10 [+0100]:
> > Norman Khine schreef op ma 15-10-2012 om 00:17 [+0100]:
> > i tried this: http://pastie.org/5059153
Btw, if I understand what you are trying to do then you can do this much
more simple. I noticed that all the a tags with onclick have an href
attribute of '#'. To get all of these do something like:
soup.findAll('a', {'href':'#'})
Then use the attrmap eg attrMap['onclick'].split('\'')[1].
Put together that may look like the below.
for i in soup.findAll('a', {'href':'#'}):
if 'toolbar=0' in i.attrMap['onclick']:
print i.attrMap['onclick'].split('\'')[1]
Greets
Sander
More information about the Tutor
mailing list