[Tutor] python and Beautiful soup question
Mark Lawrence
breamoreboy at yahoo.co.uk
Mon Jun 22 00:55:34 CEST 2015
On 21/06/2015 21:04, Joshua Valdez wrote:
> I'm having trouble making this script work to scrape information from a
> series of Wikipedia articles.
>
> What I'm trying to do is iterate over a series of wiki URLs and pull out
> the page links on a wiki portal category (e.g.
> https://en.wikipedia.org/wiki/Category:Electronic_design).
>
> I know that all the wiki pages I'm going through have a page links section.
> However when I try to iterate through them I get this error message:
>
> Traceback (most recent call last):
> File "./wiki_parent.py", line 37, in <module>
> cleaned = pages.get_text()AttributeError: 'NoneType' object has no
> attribute 'get_text'
Presumably because this line
> pages = soup.find("div" , { "id" : "mw-pages" })
doesn't find anything, pages is set to None and hence the attribute
error on the next line. I'm suspicious of { "id" : "mw-pages" } as it's
a Python dict comprehension with one entry of key "id" and value "mw-pages".
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
More information about the Tutor
mailing list