Help with using findAll() in BeautifulSoup

Alexnb alexnbryan at gmail.com
Sat Jul 12 05:46:02 CEST 2008


Okay, I am not sure if there is a better way of doing this than findAll() but
that is how I am doing it right now. I am making an app that screen scapes
dictionary.com for definitions. However, I would like to have the type of
the word for each definition. For example if def1 and def2 are noun
defintions but def3 isn't:


noun
 def1
 def2
verb
 def3

Something like that. Now I can get the definitions just fine. But the
problem comes when I want to get the type. I can get the types, but I don't
know for what definitions they go with. So I can get noun and verb, but for
all I know noun is def1, and verb is 2 and 3. I am wondering if there is a
way to use findAll() but like stop once it hits a certain thing, or a way to
do just that. for example, if I have

 noun
<table blah>
<table blah>
verb
<table blah>

I want to be able to do like findAll('span', {'class': 'pg'}), but tell me
how many <table> things are after it, or before the next  so I know how many
defintions it has.

 Here is the code I am using(I used "cheese" because that is kinda my test
word for everything in the app.):

import urllib
from BeautifulSoup import BeautifulSoup

class defWord:
    def __init__(self, word):
        self.word = word

        def get_types(term):
            soup =
BeautifulSoup(urllib.urlopen('http://dictionary.reference.com/search?q=%s' %
term))

            for tabs in soup.findAll('span', {'class': 'pg'}):
                yield tabs.contents[0].string

        self.mainList = list(get_types(self.word))
        print self.mainList

type = defWord("cheese")

I don't know if this is really something anyone can help me fix or if I have
to do it on my own. But I would love some help. 
-- 
View this message in context: http://www.nabble.com/Help-with-using-findAll%28%29-in-BeautifulSoup-tp18415792p18415792.html
Sent from the Python - python-list mailing list archive at Nabble.com.




More information about the Python-list mailing list