Headers issue with urllib2 and ClientCookie (was : ClientCookie .read() failing on some servers )

Patrick.Bussi at space.alcatel.fr Patrick.Bussi at space.alcatel.fr
Tue Apr 8 10:49:41 CEST 2003



I did not receive any answer, but, if any, it could have been "RTFM", because
the excellent ClientCookie.__doc __ addresses the issue. Answers to my own
questions are :

A/ Yes definitely. Some server  are sensitive to (un)correct header, while
others are not.
B/ My previous program attempted to add User-agent header as an ordinary header,
and it omitted the 'referer'.

As a record, here is an extract of the docstring :

------snip------
Sometimes, a server wants particular headers set to the values it expects, or it
won't play nicely.  The most  frequent offenders here are the Referer [sic] and
/ or User-Agent HTTP headers.
[...]
urllib2.OpenerDirector automatically adds a User-Agent header to every request.
Since ClientCookie.urlopen uses an OpenerDirector instance, you need to install
your own OpenerDirector using the ClientCookie.install_opener function to change
this behaviour :
>> opener.addheaders = [("User-agent",
"Mozilla/5.5.(X11;.U;.Linux.2.4;.en-US;.0.8).Gecko/20010409")]
[..]
If things don't seem to be working as expected, the first thing to try is
to switch off RFC 2965 handling, using the netscape_only argument to the
Cookies constructor.  This is because few browsers implement it, so it is
likely that some servers incorrectly implement it.  This switch is also
useful because ClientCookie does not yet fully implement redirects with
RFC 2965 cookies.
------snip------

Here is the program working :

------snip------
#! /usr/bin/env python
'''usage:
         $ python test111-03.py www.python.org  > result.log 2>&1

'''

def test(h):
    '''h is the host name. Caution : no protection against mistakes
    '''
    import ClientCookie, urllib2
    ClientCookie.HTTP_DEBUG = 1
    ClientCookie.CLIENTCOOKIE_DEBUG = 1
    cookies = ClientCookie.Cookies(netscape_only=1)
    hh = ClientCookie.HTTPHandler(cookies, handle_refresh=1)
    opener = ClientCookie.build_opener(hh)
    opener.addheaders = [("User-agent",
"Mozilla/5.5.(X11;.U;.Linux.2.4;.en-US;.0.8).Gecko/20010409")]
    ClientCookie.install_opener(opener)

    req = urllib2.Request('http://'+h)
    req.add_header("Referer", 'http://'+h+'/') # ugly. I know
    req.add_header('Accept','*/*')
    req.add_header('Accept-Language','en')
    req.add_header('Accept-Encoding','gzip,deflate,compress,identity')
    req.add_header('Keep-Alive','300')
    req.add_header('Connection','keep-alive')

    response=ClientCookie.urlopen(req)
    print '\nResponse from server %s :' % response.geturl()
    print response.info()
    for line in response.readlines(): print line

if __name__ == '__main__':
    import os, sys
    print 'Linux', ''.join(['%s'*3 %os.uname()[2:]])
    print 'Python', sys.version
    try: h=sys.argv[1]
    except: sys.exit(__doc__)
    test(h)
------snip------


 More, there was a mail in this list-archive
(http://mail.python.org/pipermail/python-list/2003-February/150438.html) which
addressed issue on http_error_302, while my error message was :
urllib2.HTTPError: HTTP Error 302: The HTTP server returned a redirect error
that wouldlead to an infinite loop.
May be the issue raised by Akai would be solved by using ClientCookie  which
seems to manage redirections well.

Thanks to the list anyway.

Patrick.


---------------------- Envoyé par Patrick Bussi/ALCATEL-SPACE le 08/04/2003
09:34 ---------------------------

Patrick Bussi   07/04/2003 11:03
(Embedded image moved to file: pic06482.pcx)

Pour :    python-list at python.org
cc :
Objet :   Headers issue with urllib2 and ClientCookie (was : ClientCookie.read()
      failing on some servers )



A/ does the wrong headers justify the failure in server response ?
B/ what is wrong in my test program headers ?



---
Patrick Bussi
patrick.bussi at space.alcatel.fr


Any opinions expressed are my own and not necessarily those of my Company.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic06482.pcx
Type: application/octet-stream
Size: 4787 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20030408/b688d9f7/attachment.obj>


More information about the Python-list mailing list