[Tutor] reading web page with BeautifulSoup

Alan Gauld alan.gauld at btinternet.com
Thu Dec 13 09:02:37 CET 2012


On 13/12/12 01:47, Ed Owens wrote:
>  >>> from urllib2 import urlopen
>  >>> page = urlopen('w1.weather.gov/obhistory/KDCA.html')
> Traceback (most recent call last):
> ValueError: unknown url type: w1.weather.gov/obhistory/KDCA.html

> copy the url from the error message into my browser and get the page.

Browsers have evolved to make all sorts of intelligent guesses about 
what the true URL is based on what the user types in. They try 
pre-pending various types and pre and post fixes (for example
you can usually miss out the www part or the .com part).

Urlopen makes no such assumptions, you must provide the full url
(with the exception of the port) including the type (ftp, mail,
http etc)

HTH
-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/



More information about the Tutor mailing list