[Python-bugs-list] [ python-Bugs-588714 ] urllib.urlopen.geturl() and redirects

noreply@sourceforge.net noreply@sourceforge.net
Wed, 31 Jul 2002 09:52:34 -0700


Bugs item #588714, was opened at 2002-07-30 19:11
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=588714&group_id=5470

Category: Python Library
Group: Python 2.2.1
Status: Open
Resolution: None
Priority: 5
Submitted By: Matthias Klose (doko)
Assigned to: Jeremy Hylton (jhylton)
Summary: urllib.urlopen.geturl() and redirects

Initial Comment:
[From http://bugs.debian.org/146408]

From: Matthew Vernon <matthew@pick.ucam.org>
Subject: python2.2: urllib.urlopen.geturl() fails to
deal with redirects properly

urllib.urlopen.geturl() claims: "

The geturl() method returns the real URL of the page.
In some cases,
the HTTP server redirects a client to another URL. The
urlopen()
function handles this transparently, but in some cases
the caller
needs to know which URL the client was redirected to.
The geturl()
method can be used to get at this redirected URL.

But it appears not to:

>>>
urllib.urlopen("http://www.google.com/search?q=test&btnI=I'm+Feeling+Lucky").geturl()
"http://www.google.com/search?q=test&btnI=I'm+Feeling+Lucky"

Doing the same by steam:

HEAD
http://www.google.com/search?q=test&btnI=I'm+Feeling+Lucky
HTTP/1.1
Host: www.google.com

HTTP/1.0 302 Moved Temporarily
Content-Length: 151
Server: GWS/2.0
Date: Thu, 09 May 2002 16:51:37 GMT
Location: http://www.toefl.org/
Content-Type: text/html



----------------------------------------------------------------------

>Comment By: Jeremy Hylton (jhylton)
Date: 2002-07-31 16:52

Message:
Logged In: YES 
user_id=31392

The body of the error message is interesting.  Google is
explicitly refusing to serve requests issues by urllib and
urllib2.  It appears to be keying on the User-Agent field.

<HTML><HEAD><TITLE>403 Forbidden</TITLE></HEAD>
<BODY><H1>403 Forbidden</H1>
Your client does not have permission to get URL
<code>/search?q=test&amp;btnI=I'm+Feeling+Lucky</code> from
this server.
(Client IP address: 208.251.201.35)<BR><BR>
Please see Google's Terms of Service posted at
http://www.google.com/terms_of_service.html
<BR><BR><P>If you believe that you have received this
response in error, please send email to <A
href="mailto:forbidden@google.com">forbidden@google.com</A>.
 Before sending this email, however, please make sure to
take a look at our Terms of Service
(http://www.google.com/terms_of_service.html).In your email,
please send us the <b>entire</b> code displayed below. 
Please also send us any information you may know about how
you are performing your Google searches-- for example, "I'm
using the Opera browser on Linux to do searches from home. 
My Internet access is through a dial-up account I have with
the FooCorp ISP." or "I'm using the Konqueror browser on
Linux to search from my job at myFoo.com.  My machine's IP
address is 10.20.30.40, but all of myFoo's web traffic goes
through some kind of proxy server whose IP address is
10.11.12.13."  (If you don't know any information like this,
that's OK.  But this kind of information can help us track
down problems, so please tell us what you can.)</P><P>We
will use all this information to diagnose the problem, and
we'll hopefully have you back up and Googlin' again quickly!</P>
<P>Please note that although we read all the email we
receive, we are not always able to send a personal response
to each and every email.  So don't despair if you don't hear
back from us!</P>
<P>Also note that if you do not send us the <b>entire</b>
code below, <i>we will not be able to help
you</i>.</P><P>Best wishes,<BR>The Google
Team</BR></P><BLOCKQUOTE>/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/<BR>
AD1IFXbQ-8kjZGNiMTEAAGtHRVQgL3NlYXJjaD9xPXRlc3QmY<BR>
nRuST1JJ20rRmVlbGluZytMdWNreSBIVFRQLzEuMA0KSG9zdD<BR>
ogd3d3Lmdvb2dsZS5jb20NClVzZXItYWdlbnQ6IFB5dGhvbi1<BR>
1cmxsaWIvMi4wYTENCrSY3UI=<BR>
+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+/+<BR></BLOCKQUOTE>

                                                           
                                                           
                                                           
                                                           
                                                           
                                                           
                                                           
                                                           
                                        
</BODY></HTML>


----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2002-07-31 08:34

Message:
Logged In: YES 
user_id=6656

Something even wierder happens when I try urllib2:

>>>
urllib2.urlopen("http://www.google.com/search?q=test&btnI=I'm+Feeling+Lucky").geturl()

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
136, in urlopen
    return _opener.open(url, data)
  File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
324, in open
    '_open', req)
  File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
303, in _call_chain
    result = func(*args)
  File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
792, in http_open
    return self.do_open(httplib.HTTP, req)
  File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
786, in do_open
    return self.parent.error('http', req, fp, code, msg, hdrs)
  File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
350, in error
    return self._call_chain(*args)
  File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
303, in _call_chain
    result = func(*args)
  File "/home/mwh/src/python/dist/src/Lib/urllib2.py", line
402, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

(sf is going to mangle that traceback, I can tell).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=588714&group_id=5470