[Python-bugs-list] [ python-Bugs-588714 ] urllib.urlopen.geturl() and redirects
noreply@sourceforge.net
noreply@sourceforge.net
Fri, 02 Aug 2002 07:36:12 -0700
Bugs item #588714, was opened at 2002-07-30 19:11
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=588714&group_id=5470
Category: Python Library
Group: Python 2.2.1
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Matthias Klose (doko)
Assigned to: Jeremy Hylton (jhylton)
Summary: urllib.urlopen.geturl() and redirects
Initial Comment:
[From http://bugs.debian.org/146408]
From: Matthew Vernon <matthew@pick.ucam.org>
Subject: python2.2: urllib.urlopen.geturl() fails to
deal with redirects properly
urllib.urlopen.geturl() claims: "
The geturl() method returns the real URL of the page.
In some cases,
the HTTP server redirects a client to another URL. The
urlopen()
function handles this transparently, but in some cases
the caller
needs to know which URL the client was redirected to.
The geturl()
method can be used to get at this redirected URL.
But it appears not to:
>>>
urllib.urlopen("http://www.google.com/search?q=test&btnI=I'm+Feeling+Lucky").geturl()
"http://www.google.com/search?q=test&btnI=I'm+Feeling+Lucky"
Doing the same by steam:
HEAD
http://www.google.com/search?q=test&btnI=I'm+Feeling+Lucky
HTTP/1.1
Host: www.google.com
HTTP/1.0 302 Moved Temporarily
Content-Length: 151
Server: GWS/2.0
Date: Thu, 09 May 2002 16:51:37 GMT
Location: http://www.toefl.org/
Content-Type: text/html
----------------------------------------------------------------------
>Comment By: Jeremy Hylton (jhylton)
Date: 2002-08-02 14:36
Message:
Logged In: YES
user_id=31392
The original bug report was that geturl() returns the
incorrect result. In this case, it has returned the correct
URL because google did not redirect it. There is no Python
bug here, so I trust the debian folks will close their bug
report, too. The original poster should probably take up
the issue with Google, or set a custom user-agent header.
----------------------------------------------------------------------
Comment By: Jeremy Hylton (jhylton)
Date: 2002-07-31 16:52
Message:
Logged In: YES
user_id=31392
The body of the error message is interesting. Google is
explicitly refusing to serve requests issues by urllib and
urllib2. It appears to be keying on the User-Agent field.
<HTML><HEAD><TITLE>403 Forbidden</TITLE></HEAD>
<BODY><H1>403 Forbidden</H1>
Your client does not have permission to get URL
<code>/search?q=test&btnI=I'm+Feeling+Lucky</code> from
this server.
(Client IP address: 208.251.201.35)<BR><BR>
Please see Google's Terms of Service posted at
http://www.google.com/terms_of_service.html
<BR><BR><P>If you believe that you have received this
response in error, please send email to <A
href="mailto:forbidden@google.com">forbidden@google.com</A>.
Before sending this email, however, please make sure to
take a look at our Terms of Service
(http://www.google.com/terms_of_service.html).In your email,
please send us the <b>entire</b> code displayed below.
Please also send us any information you may know about how
you are performing your Google searches-- for example, "I'm
using the Opera browser on Linux to do searches from home.
My Internet access is through a dial-up account I have with
the FooCorp ISP." or "I'm using the Konqueror browser on
Linux to search from my job at myFoo.com. My machine's IP
address is 10.20.30.40, but all of myFoo's web traffic goes
through some kind of proxy server whose IP address is
10.11.12.13." (If you don't know any information like this,
that's OK. But this kind of information can help us track
down problems, so please tell us what you can.)</P><P>We
will use all this information to diagnose the problem, and
we'll hopefully have you back up and Googlin' again quickly!</P>
<P>Please note that although we read all the email we
receive, we are not always able to send a personal response
to each and every email. So don't despair if you don't hear
back from us!</P>
<P>Also note that if you do not send us the <b>entire</b>
code below, <i>we will not be able to help
you</i>.</P><P>Best wishes,<BR>The Google
Team</BR></P><BLOCKQUOTE>/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/<BR>
AD1IFXbQ-8kjZGNiMTEAAGtHRVQgL3NlYXJjaD9xPXRlc3QmY<BR>
nRuST1JJ20rRmVlbGluZytMdWNreSBIVFRQLzEuMA0KSG9zdD<BR>
ogd3d3Lmdvb2dsZS5jb20NClVzZXItYWdlbnQ6IFB5dGhvbi1<BR>
1cmxsaWIvMi4wYTENCrSY3UI=<BR>
+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+<BR></BLOCKQUOTE>
</BODY></HTML>
----------------------------------------------------------------------
Comment By: Michael Hudson (mwh)
Date: 2002-07-31 08:34
Message:
Logged In: YES
user_id=6656
Something even wierder happens when I try urllib2:
>>>
urllib2.urlopen("http://www.google.com/search?q=test&btnI=I'm+Feeling+Lucky").geturl()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
136, in urlopen
return _opener.open(url, data)
File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
324, in open
'_open', req)
File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
303, in _call_chain
result = func(*args)
File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
792, in http_open
return self.do_open(httplib.HTTP, req)
File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
786, in do_open
return self.parent.error('http', req, fp, code, msg, hdrs)
File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
350, in error
return self._call_chain(*args)
File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
303, in _call_chain
result = func(*args)
File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
402, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
(sf is going to mangle that traceback, I can tell).
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=588714&group_id=5470