You can use httplib library to download the html and then for extracting the text from it either you can use any library (google for it) or you can use regular expression for it .