For one of our casino websites we're trying the BeautifulSoup module which can handle HTML and XML. It should provide a simple method for searching, navigating and modifying the parse tree. However? Can we store and push this right away?

The example below prints all links on a webpage:

from BeautifulSoup import BeautifulSoup
import urllib2
import re

html_page = urllib2.urlopen("https://www.online-casino-spielautomaten.de")
soup = BeautifulSoup(html_page)
for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
    print link.get('href')

It downloads the raw html code with the line:


html_page = urllib2.urlopen("https://www.online-casino-spielautomaten.de")

A BeautifulSoup object is created and we use this object to find all links:


soup = BeautifulSoup(html_page)
for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
    print link.get('href')