[Tutor] reading web page with BeautifulSoup
Don Jennings
dfjennings at gmail.com
Thu Dec 13 03:58:03 CET 2012
On Dec 12, 2012, at 8:54 PM, tutor-request at python.org wrote:
> Date: Wed, 12 Dec 2012 20:47:58 -0500
> From: Ed Owens <eowens0124 at gmx.com>
> To: tutor at python.org
> Subject: [Tutor] reading web page with BeautifulSoup
> Message-ID: <50C933CE.5010503 at gmx.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>>>> from urllib2 import urlopen
>>>> page = urlopen('w1.weather.gov/obhistory/KDCA.html')
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 126, in urlopen
> return _opener.open(url, data, timeout)
> File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 386, in open
> protocol = req.get_type()
> File
> "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py",
> line 248, in get_type
> raise ValueError, "unknown url type: %s" % self.__original
> ValueError: unknown url type: w1.weather.gov/obhistory/KDCA.html
>>>>
>
> Can anyone see what I'm doing wrong here?
Yes, you should pass the full url, including the scheme:
urlopen('http://w1.weather.gov/obhistory/KDCA.html')
By the way, your subject line would be better if it had something to do with url, as the problem is completely unrelated to BeautifulSoup :>)
Take care,
Don
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20121212/a9a914b7/attachment.html>
More information about the Tutor
mailing list