[Tutor] Trip Advisor Web Scraping

Francis Pino pinof1 at mail.montclair.edu
Tue Feb 21 21:55:01 EST 2017


I need to recode my hotel ratings as 1-3  = Negative and  4-5 Positive. Can
you help point me in the direction to do this? I know I need to make a loop
using for and in and may statement like for rating in review if review >= 3
 print ('Negative') else print 'Negative'.  Here's my code so far.


from bs4 import BeautifulSoup
from selenium import webdriver
import csv

url = "
https://www.tripadvisor.com/Hotel_Review-g34678-d87695-Reviews-Clarion_Hotel_Conference_Center-Tampa_Florida.html
"
# driver = webdriver.Firefox()
driver = webdriver.Chrome()
driver.get(url)

# The HTML code for the web page is stored in the html variable
html = driver.page_source

# we will use the soup object to parse HTML
# BeautifulSoup reference
# https://www.crummy.com/software/BeautifulSoup/bs4/doc/

soup = BeautifulSoup(html, "lxml")


# we will use find_all method to find all paragraph tags of class
partial_entry
# The result of this command is a list
# we use for .. in [] to iterate through the list

reviews = []
ratings= []

for review in soup.find_all("p", "partial_entry"):
    print(review.text)
    #print ('\n\n')
    reviews += review
    print(len(reviews))

# similarly we can identify the ratings
# note that the code is incomplete - it will require additional work


for rating in soup.find_all("span", "ui_bubble_rating"):
    print(rating.text)



Thanks!


More information about the Tutor mailing list