Help with using findAll() in BeautifulSoup

Stefan Behnel stefan_ml at
Sat Jul 12 07:55:12 CEST 2008

Alexnb wrote:
> Okay, I am not sure if there is a better way of doing this than findAll() but
> that is how I am doing it right now.

Consider using lxml.html and lxml.cssselect.

> I am making an app that screen scapes
> for definitions.

Do they have a policy for doing that?

>  noun
> <table blah>
> <table blah>
> verb
> <table blah>
> I want to be able to do like findAll('span', {'class': 'pg'}), but tell me
> how many <table> things are after it, or before the next  so I know how many
> defintions it has.

You didn't say where the "span" is in the HTML code, but lxml.cssselect should
get you pretty close to what you want. If your tables are descendants of the
"span"s, a selector like:

    " table"

might work. There's also a CSS syntax for siblings.


More information about the Python-list mailing list