Web client login with redirection and cookies

John Hunter jdhunter at ace.bsd.uchicago.edu
Tue Aug 6 22:23:48 CEST 2002


>>>>> "John" == John  <john_lewis at mindspring.com> writes:

    John> Hi, I'm trying to login to an intranet site that uses
    John> cookies and redirection for a web scraping script.  Are
    John> there any good examples of how to accomplish this in Python?
    John> I recently managed to get this type of login working in
    John> Perl, and am now playing around with this in Python.

    John> I have only been working with Perl for 6 months casually for
    John> a few database and web scraping applications for automating
    John> reporting, and have been thinking about switching to Python
    John> before I invest too much more time.  I am already struggling
    John> a bit in maintaining my fairly small amount of code as I
    John> only work on it a few days out of a month and thought Python
    John> might benefit me in this regard.

Without a site a desired cookie vals, I can't provide a working
example, but here is a low level example where you ca directly
manipulate the http header and set the cookie and/or referer values.

There are friendlier http interfaces (see
http://groups.google.com/groups?q=FancyURLopener+cookie&ie=UTF-8&oe=UTF-8&hl=en&btnG=Google+Search)
but this should get you started.

import httplib
import socket
host = 'slashdot.org'
pathn = '/science/01/12/03/1630212.shtml'
try:
    h = httplib.HTTP(host)
    h.putrequest('GET', pathn)
    h.putheader('Accept', 'text/html')
    h.putheader('Accept', 'text/plain')
    h.putheader('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010913')
    h.putheader('Referer', 'http://migley.zko.dec.com/httpget.py')
    h.putheader('From', 'marshall at migley.zko.dec.com')
    h.putheader('Cookie', 'mycookieval')
#    h.putrequest('Referer', 'http://myserver.com')

    h.endheaders()
    errcode, errmsg, headers = h.getreply()
except socket.error, er:
    print 'socket error ', er


print h.getfile().read()

JDH




More information about the Python-list mailing list