how to find not the next sibling but the 2nd sibling or find sibling "a" OR sinbling "b"
localpricemaps at gmail.com
localpricemaps at gmail.com
Thu Jan 19 11:20:05 EST 2006
i actually realized there are 3 potentials for class names. either
food or drink or dessert. so my question is whether or not i can alter
your function to look like this?
def isFoodOrDrinkOrDesert(attr):
return attr in ['food', 'drink', 'desert']
thanks in advance for the help
Kent Johnson wrote:
> localpricemaps at gmail.com wrote:
> > i have some html which looks like this where i want to scrape out the
> > href stuff (the www.cnn.com part)
> >
> > <div class="noFood">Cheese</div>
> > <div class="food">Blue</div>
> > <a class="btn" href = "http://www.cnn.com">
> >
> >
> > so i wrote this code which scrapes it perfectly:
> >
> > for incident in row('div', {'class':'noFood'}):
> > b = incident.findNextSibling('div', {'class': 'food'})
> > print b
> > n = b.findNextSibling('a', {'class': 'btn'})
> > print n
> > link = n['href'] + "','"
> >
> > problem is that sometimes the 2nd tag , the <div class="food"> tag , is
> > sometimes called food, sometimes called drink.
>
> Apparently you are using Beautiful Soup. The value in the attribute
> dictionary can be a callable; try this:
>
> def isFoodOrDrink(attr):
> return attr in ['food', 'drink']
>
> b = incident.findNextSibling('div', {'class': isFoodOrDrink})
>
> Alternately you could omit the class spec and check for it in code.
>
> Kent
More information about the Python-list
mailing list