For one of our casino websites we're trying the BeautifulSoup module which can handle HTML and XML. It should provide a simple method for searching, navigating and modifying the parse tree. However? Can we store and push this right away?
The example below prints all links on a webpage:
from BeautifulSoup import BeautifulSoup import urllib2 import re
html_page = urllib2.urlopen("https://www.online-casino-spielautomaten.de") soup = BeautifulSoup(html_page) for link in soup.findAll('a', attrs={'href': re.compile("^http://")}): print link.get('href')
It downloads the raw html code with the line:
html_page = urllib2.urlopen("https://www.online-casino-spielautomaten.de")
A BeautifulSoup object is created and we use this object to find all links:
soup = BeautifulSoup(html_page) for link in soup.findAll('a', attrs={'href': re.compile("^http://")}): print link.get('href')
participants (2)
-
Ash Khalifa
-
refer sent