[Tutor] Beautiful Soup

Laura Creighton lac at openend.se
Tue Sep 29 20:48:49 CEST 2015


>> Hi
>>
>> I have recently finished reading "Starting out with Python" and I
>> really want to do some web scraping. Please kindly advise where I can
>> get more information about BeautifulSoup. It seems that Documentation
>> is too hard for me.
>>
>> Furthermore, I have tried to scrap this site but it seems that there
>> is an error (<http.client.HTTPResponse object at 0x02C09F90>). Please
>> advise what I should do in order to overcome this.
>>
>>
>> from bs4 import BeautifulSoup
>> import urllib.request
>>
>> HKFile = urllib.request.urlopen("
>> https://bochk.etnet.com.hk/content/bochkweb/tc/quote_transaction_daily_history.php?code=2388
>> ")
>> HKHtml = HKFile.read()
>> HKFile.close()
>>
>> print(HKFile)

<http.client.HTTPResponse object at 0x02C09F90> is not an error.

If you want to print your file change print(HKFile)
to print(HKHtml.decode("some-encoding")) where some-encoding is what
the website is encoded in, these days utf-8 is most likely.

If you want a tutorial on webscraping, not Beautiful Soup
try:
http://doc.scrapy.org/en/latest/intro/tutorial.html

which is about using scrapy, a set of useful webscraping tools.

the scrapy wiki is also useful
https://github.com/scrapy/scrapy/wiki

and there are many video tutorials available if you like that sort of thing.
Just google for python scrapy tutorial.

Laura


More information about the Tutor mailing list