[Tutor] (regular expression)

Sat Dec 10 20:52:02 EST 2016

this is the real code


with urllib.request.urlopen("https://www.sdstate.edu/electrical-engineering-and-computer-science") as cs:
    cs_page = cs.read()
    soup = BeautifulSoup(cs_page, "html.parser")
    print(len(soup.body.find_all(string = ["Engineering","engineering"])))

i used control + f on the link in the code and i get 11 for ctrl + f and 3 for the code

THanks


________________________________
From: Tutor <tutor-bounces+itetteh34=hotmail.com at python.org> on behalf of Bob Gailer <bgailer at gmail.com>
Sent: Saturday, December 10, 2016 7:54 PM
To: Tetteh, Isaac - SDSU Student
Cc: Python Tutor
Subject: Re: [Tutor] (no subject)

On Dec 10, 2016 12:15 PM, "Tetteh, Isaac - SDSU Student" <
isaac.tetteh at jacks.sdstate.edu> wrote:
>
> Hello,
>
> I am trying to find the number of times a word occurs on a webpage so I
used bs4 code below
>
> Let assume html contains the "html code"
> soup = BeautifulSoup(html, "html.pa<http://html.pa>rser")
> print(len(soup.fi<http://soup.fi
>nd_all(string=["Engineering","engineering"])))
> But the result is different from when i use control + f on my keyboard to
find
>
> Please help me understand why it's different results. Thanks
> I am using Python 3.5
>
What is the URL of the web page?
To what are you applying control-f?
What are the two different counts you're getting?
Is it possible that the page is being dynamically altered after it's loaded?
_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor
Tutor Info Page - Python<https://mail.python.org/mailman/listinfo/tutor>
mail.python.org
This list is for folks who want to ask questions regarding how to learn computer programming with the Python language and its standard library.