[Tutor] reading web page with BeautifulSoup

Thu Dec 13 03:58:03 CET 2012

On Dec 12, 2012, at 8:54 PM, tutor-request at python.org wrote:

> Date: Wed, 12 Dec 2012 20:47:58 -0500
> From: Ed Owens <eowens0124 at gmx.com>
> To: tutor at python.org
> Subject: [Tutor] reading web page with BeautifulSoup
> Message-ID: <50C933CE.5010503 at gmx.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
>>>> from urllib2 import urlopen
>>>> page = urlopen('w1.weather.gov/obhistory/KDCA.html')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", 
> line 126, in urlopen
>     return _opener.open(url, data, timeout)
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", 
> line 386, in open
>     protocol = req.get_type()
>   File 
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", 
> line 248, in get_type
>     raise ValueError, "unknown url type: %s" % self.__original
> ValueError: unknown url type: w1.weather.gov/obhistory/KDCA.html
>>>> 
> 
> Can anyone see what I'm doing wrong here? 

Yes, you should pass the full url, including the scheme:

urlopen('http://w1.weather.gov/obhistory/KDCA.html')

By the way, your subject line would be better if it had something to do with url, as the problem is completely unrelated to BeautifulSoup :>)

Take care,
Don
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20121212/a9a914b7/attachment.html>