[Tutor] reading web page with BeautifulSoup

Ed Owens eowens0124 at gmx.com
Thu Dec 13 03:11:56 CET 2012


On 12/12/12 9:03 PM, Dave Angel wrote:
> On 12/12/2012 08:47 PM, Ed Owens wrote:
>>>>> from urllib2 import urlopen
>>>>> page = urlopen('w1.weather.gov/obhistory/KDCA.html')
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in <module>
>>    File
>> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
>> line 126, in urlopen
>>      return _opener.open(url, data, timeout)
>>    File
>> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
>> line 386, in open
>>      protocol = req.get_type()
>>    File
>> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
>> line 248, in get_type
>>      raise ValueError, "unknown url type: %s" % self.__original
>> ValueError: unknown url type: w1.weather.gov/obhistory/KDCA.html
>> Can anyone see what I'm doing wrong here?  I have bs4 and urllib2
>> imported, and get the above error when trying to read that page.  I
>> can copy the url from the error message into my browser and get the page.
> Like the error says, unknown type.  Prepend the type of the url, and it
> should work fine:
>
> page = urlopen('http://w1.weather.gov/obhistory/KDCA.html')
>

> Yep, that was it.  Thanks for the help.  Now on to fight with BeautifulSoup

Ed



More information about the Tutor mailing list