[Tutor] Unable to download <th>, <td> using Beautifulsoup
badouglas at gmail.com
Fri Jul 29 18:10:04 EDT 2016
In following up/on what Walter said.
content, you need to have a different approach.
The most "complete" is the use of a headless browser. However, the
use/implementation of a headless browser has its' own share of issues.
Speed, complexity, etc...
A potentially better/useful method is to view/look at the traffic
(livehttpheaders for Firefox) to get a feel for exactly what the browser
requires. At the same time, view the subordinate jscript functions.
I've found it's often enough to craft the requisite cookies/curl functions
in order to simulate the browser data.
In a few cases though, I've run across situations where a headless browser
is the only real soln.
On Fri, Jul 29, 2016 at 3:28 AM, Crusier <crusier at gmail.com> wrote:
> I am using Python 3 on Windows 7.
> However, I am unable to download some of the data listed in the web
> site as follows:
> 453.IMC 98.28M 18.44M 4.32 5.33 1499.Optiver 70.91M 13.29M 3.12 5.34
> 7387.花旗环球 52.72M 9.84M 2.32 5.36
> When I use Google Chrome and use 'View Page Source', the data does not
> show up at all. However, when I use 'Inspect', I can able to read the
> '<th>1499.Optiver </th>'
> '<td> 70.91M</td>'
> '<td>13.29M </td>'
> Please kindly explain to me if the data is hide in CSS Style sheet or
> is there any way to retrieve the data listed.
> Thank you
> Regards, Crusier
> from bs4 import BeautifulSoup
> import urllib
> import requests
> stock_code = ('00939', '0001')
> def web_scraper(stock_code):
> broker_url = 'http://data.tsci.com.cn/stock/'
> end_url = '/STK_Broker.htm'
> for code in stock_code:
> new_url = broker_url + code + end_url
> response = requests.get(new_url)
> html = response.content
> soup = BeautifulSoup(html, "html.parser")
> Buylist = soup.find_all('div', id ="BuyingSeats")
> Selllist = soup.find_all('div', id ="SellSeats")
> Tutor maillist - Tutor at python.org
> To unsubscribe or change subscription options:
More information about the Tutor