Simple example that won't work!

Sun Jun 24 05:43:08 EDT 2001

    > I posted this to the Python Tutor list, but wasn't really satisfied with
    > my result. I'll just repaste it here, and hope for better luck ;-)

jay,

sorry to hear about that.  the Tutor list is usually very responsive!

    > Hello to all my fellow Python lovers. I'm only learning Python right now
    > (after a few years working with the comparatively terrible C/C++) and I am
    > absolutely loving it. To learn the language, I am using Wesley Chun's
    > "Core Python Programming".  It's a very good book with some really good
    > examples, but it's with one of the examples that I'm having difficulty.

thanks for the kudos.  i'm glad to hear it's working out for you.

    > The example is a simple web-based example. All it does is retrieve an HTML
    > document, and print out the first and last non- blank lines of the page.
    > The error, though, occurs with the urlretrieve( ) call. When I call it, I
    > get the following exception message:
    >
    > Traceback (most recent call last):
    >   File "C:\Program Files\Python20\Pythonwin\pywin\framework\scriptutils.py", line 301, in RunScript
    >     exec codeObject in __main__.__dict__
    >		:
    >   File "c:\program files\python20\lib\httplib.py", line 330, in __init__
    >     self._set_hostport(host, port)
    >   File "c:\program files\python20\lib\httplib.py", line 336, in _set_hostport
    >     port = int(host[i+1:])
    > ValueError: invalid literal for int():

like some of the others on the list, i have not been able to reproduce
your problem.  the URL listed in your example points at a dead server
anyway, so if we try it on something live, we see something like this:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
% python
Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.8 -- press F1 for help
>>> from urllib import urlretrieve
>>> url = 'http://www.yahoo.com:80/'
>>> pair = urlretrieve(url)
>>> pair
('C:\\WINDOWS\\TEMP\\~-259995-0', <mimetools.Message instance at 00B5D39C>)
>>> f = open(pair[0])
>>> data = f.readlines()
>>> f.close()
>>> print data[-1]
<a href=r/ao>Advertising</a><p>Copyright © 2001 Yahoo! Inc. All rights reserved.</small><br><a href=r/pv>Privacy Policy</a></form></center></body></html>

>>>
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

based on the error message, it looks like the problem occurs when it
is trying to convert the port number, i.e., "3600" in your example
into an integer.

check out this example...

>>> from urllib import splithost, splitport, splittype
>>>
>>> splithost(url)				# removes path from URL
(None, 'http://www.yahoo.com:80/')
>>> splittype(url)				# removes scheme from URL path
('http', '//www.yahoo.com:80/')
>>> splithost(splittype(url)[1])		# pulls out host:port pair
('www.yahoo.com:80', '/')
>>> splitport(splithost(splittype(url)[1])[0])	# splits host and port
('www.yahoo.com', '80')

if you get a valid integer string, such as 80 in the above example,
then the call to the built-in function int() should not fail:

>>> int(splitport(splithost(splittype(url)[1])[0])[1])
80

hope this helps!

-wesley

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Core Python Programming", Prentice Hall PTR, December 2000
    http://starship.python.net/crew/wesc/cpp/

wesley.j.chun :: wesc at baypiggies.org
cyberweb.consulting :: silicon.valley, ca
http://www.roadkill.com/~wesc/cyberweb/