[Tutor] Trip Advisor Web Scraping
Francis Pino
pinof1 at mail.montclair.edu
Tue Feb 21 21:55:01 EST 2017
I need to recode my hotel ratings as 1-3 = Negative and 4-5 Positive. Can
you help point me in the direction to do this? I know I need to make a loop
using for and in and may statement like for rating in review if review >= 3
print ('Negative') else print 'Negative'. Here's my code so far.
from bs4 import BeautifulSoup
from selenium import webdriver
import csv
url = "
https://www.tripadvisor.com/Hotel_Review-g34678-d87695-Reviews-Clarion_Hotel_Conference_Center-Tampa_Florida.html
"
# driver = webdriver.Firefox()
driver = webdriver.Chrome()
driver.get(url)
# The HTML code for the web page is stored in the html variable
html = driver.page_source
# we will use the soup object to parse HTML
# BeautifulSoup reference
# https://www.crummy.com/software/BeautifulSoup/bs4/doc/
soup = BeautifulSoup(html, "lxml")
# we will use find_all method to find all paragraph tags of class
partial_entry
# The result of this command is a list
# we use for .. in [] to iterate through the list
reviews = []
ratings= []
for review in soup.find_all("p", "partial_entry"):
print(review.text)
#print ('\n\n')
reviews += review
print(len(reviews))
# similarly we can identify the ratings
# note that the code is incomplete - it will require additional work
for rating in soup.find_all("span", "ui_bubble_rating"):
print(rating.text)
Thanks!
More information about the Tutor
mailing list