how to find not the next sibling but the 2nd sibling or find sibling "a" OR sinbling "b"

localpricemaps at gmail.com localpricemaps at gmail.com
Thu Jan 19 11:20:05 EST 2006


i actually realized there are 3 potentials for class names.  either
food or drink or dessert.  so my question is whether or not i can alter
your function to look like this?

 def isFoodOrDrinkOrDesert(attr):
    return attr in ['food', 'drink', 'desert']


thanks in advance for the help

Kent Johnson wrote:
> localpricemaps at gmail.com wrote:
> > i have some html which looks like this where i want to scrape out the
> > href stuff (the www.cnn.com part)
> >
> > <div class="noFood">Cheese</div>
> > <div class="food">Blue</div>
> > <a class="btn" href = "http://www.cnn.com">
> >
> >
> > so i wrote this code which scrapes it perfectly:
> >
> > for incident in row('div', {'class':'noFood'}):
> > 			b = incident.findNextSibling('div', {'class': 'food'})
> >        			print b
> > 			n = b.findNextSibling('a', {'class': 'btn'})
> >        			print n
> > 			link = n['href'] + "','"
> >
> > problem is that sometimes the 2nd tag , the <div class="food"> tag , is
> > sometimes called food, sometimes called drink.
>
> Apparently you are using Beautiful Soup. The value in the attribute
> dictionary can be a callable; try this:
>
> def isFoodOrDrink(attr):
>    return attr in ['food', 'drink']
>
> b = incident.findNextSibling('div', {'class': isFoodOrDrink})
>
> Alternately you could omit the class spec and check for it in code.
> 
> Kent




More information about the Python-list mailing list