[Tutor] String Problem

Cameron Simpson cs at zip.com.au
Mon Jul 6 10:19:45 CEST 2015


On 06Jul2015 15:44, Crusier <crusier at gmail.com> wrote:
>Dear All,
>
>I have used the urllib.request and download some of the information from a
>site.
>
>I am currently using Python 3.4. My program is as follows:
>
>import urllib.request
>
>response = urllib.request.urlopen('
>http://www.hkex.com.hk/eng/ddp/Contract_Details.asp?PId=175')
>
>saveFile = open('HKEX.txt','w')
>saveFile.write(str(response.read()))
>saveFile.close()
>
>And the result is as follows:
>
>d align="right"> - </td><td align="right">0</td><td
[...]
>Please let me know how to deal with this string. I hope I could put onto a
>table first. Eventually, I am hoping that I can able to put all this
>database. I need some guidance of which area of coding I should look into.

Look into the BeautifulSoup library, which will parse HTML. That will let you 
locate the TABLE element and extract the content by walking the rows (TR) and 
cells (TD).

Start here:

  http://www.crummy.com/software/BeautifulSoup/bs4/doc/

You can install bs4 using pip, or in other ways:

  http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-beautiful-soup

Cheers,
Cameron Simpson <cs at zip.com.au>

30 years ago, I made some outrageous promises about AI. I didn't deliver.
Neither did you. This is all your fault. - Marvin Minsky, IJCAI'91 (summary)


More information about the Tutor mailing list