Why doesn't input code return 'plants' as in 'Getting Started with Beautiful Soup' text (on page 30) ?
Peter Otten
__peter__ at web.de
Sun Jul 12 05:51:58 EDT 2015
Simon Evans wrote:
> Dear Mark Lawrence, thank you for your advice.
> I take it that I use the input you suggest for the line :
>
> soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid.html",lxml")
>
> seeing as I have to give the file's full address I therefore have to
> modify your :
>
> soup = BeautifulSoup(ecological_pyramid,"lxml")
>
> to :
>
> soup = BeautifulSoup("C:\Beautiful Soup\ecological_pyramid," "lxml")
>
> otherwise I get :
>
>
>>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as
>>>> ecological_pyramid: soup = BeautifulSoup(ecological_pyramid,"lxml")
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'ecological_pyramid' is not defined
>
>
> so anyway with the input therefore as:
>
>>>> with open("C:\Beautiful Soup\ecologicalpyramid.html"."r")as
>>>> ecological_pyramid: soup = BeautifulSoup("C:\Beautiful
>>>> Soup\ecological_pyramid,","lxml") producer_entries = soup.find("ul")
>>>> print(producer_entries.li.div.string)
No. If you pass the filename beautiful soup will mistake it as the HTML. You
can verify that in the interactive interpreter:
>>> soup = BeautifulSoup("C:\Beautiful Soup\ecologicalpyramid.html","lxml")
>>> soup
<html><body><p>C:\Beautiful Soup\ecologicalpyramid.html</p></body></html>
You have to pass an open file to BeautifulSoup, not a filename:
>>> with open("C:\Beautiful Soup\ecologicalpyramid.html","r") as f:
... soup = BeautifulSoup(f, "lxml")
...
However, if you look at the data returned by soup.find("ul") you'll see
>>> producer_entries = soup.find("ul")
>>> producer_entries
<ul id="producers">
<li class="producers">
</li><li class="producerlist">
<div class="name">plants</div>
<div class="number">100000</div>
</li>
<li class="producerlist">
<div class="name">algae</div>
<div class="number">100000</div>
</li>
</ul>
The first <li>...</li> node does not contain a div
>>> producer_entries.li
<li class="producers">
</li>
and thus
>>> producer_entries.li.div is None
True
and the following error is expected with the given data.
Returning None is beautiful soup's way of indicating that the
<li> node has no <div> child at all. If you want to
process the first li that does have a <div> child a straight-forward
way is to iterate over the children:
>>> for li in producer_entries.find_all("li"):
... if li.div is not None:
... print(li.div.string)
... break # remove if you want all, not just the first
...
plants
Taking a second look at the data you probably want the li nodes with
class="producerlist":
>>> for li in soup.find_all("li", attrs={"class": "producerlist"}):
... print(li.div.string)
...
plants
algae
More information about the Python-list
mailing list