[Tutor] Unable to download <th>, <td> using Beautifulsoup

Alan Gauld alan.gauld at yahoo.co.uk
Fri Jul 29 11:46:19 EDT 2016


On 29/07/16 08:28, Crusier wrote:

> When I use Google Chrome and use 'View Page Source', the data does not
> show up at all. However, when I use 'Inspect', I can able to read the
> data.
> 
> '<th>1453.IMC</th>'
> '<td>98.28M</td>'

> '<td>3.12</td>'
> '<td>5.34</td>'
> 
> Please kindly explain to me if the data is hide in CSS Style sheet or
> is there any way to retrieve the data listed.

I don;t know the answer but I would suggest that if you print
out (or send to a file)  the entire html source returned by
the server you can see what is actually happening and from
that perhaps figure out what to do with BS to extract what
you need.


> from bs4 import BeautifulSoup
> import urllib
> import requests
> 
> stock_code = ('00939', '0001')
> 
> def web_scraper(stock_code):
> 
>     broker_url = 'http://data.tsci.com.cn/stock/'
>     end_url = '/STK_Broker.htm'
> 
>     for code in stock_code:
>         new_url  = broker_url + code + end_url
>         response = requests.get(new_url)
>         html = response.content

Try sending html to a file and examining it in a
text editor...


-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list