Mimic a web surfer ?

John Hunter jdhunter at ace.bsd.uchicago.edu
Tue Apr 1 09:28:03 EST 2003


>>>>> "shagshag13" == shagshag13  <shagshag13 at yahoo.fr> writes:

    shagshag13> hello, i would like to develop a script which will
    shagshag13> mimic a web surfer, that is :

    shagshag13> - follow links as a human and thus always come from
    shagshag13> previous page (i think there is something to do with
    shagshag13> http 'referer' and cookies) - be able to log if
    shagshag13> necessary

    shagshag13> is there anything already existing ? i'm looking for
    shagshag13> any tips which will help me to start...

    shagshag13> note : this could be used to have "automatic" access
    shagshag13> to free webmails not providing pop protocol.

Yes, this is possible.  As you say, you generally need to set referer
and or cookies.  This can be done with httplib.  I often find the best
way to do this is to install a sniffer which monitors traffic on port
80.  If you are on linux, you may want to try sniffit.  On windows, I
have had luck with proxomitron.  With one of these sniffers running,
browse the site you want to mimic and capture all the traffic.  Pore
through the logs and figure out how the site wants you to handle
cookies and referers, and then whip up your python version.  Here is
an example script using httplib  that sets referer and cookie headers

import httplib
import socket
host = 'slashdot.org'
pathn = '/science/01/12/03/1630212.shtml'
try:
    h = httplib.HTTP(host)
    h.putrequest('GET', pathn)
    h.putheader('Accept', 'text/html')
    h.putheader('Accept', 'text/plain')
    h.putheader('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010913')
    h.putheader('From', 'marshall at migley.zko.dec.com')
    h.putheader('Cookie', 'mycookieval')
    h.putrequest('Referer', 'http://myserver.com')

    h.endheaders()
    errcode, errmsg, headers = h.getreply()
except socket.error, er:
    print 'socket error ', er

print errcode

hkeys = headers.keys()
#for key in hkeys:
#    print key, ' = ', headers.getrawheader(key)

John Hunter





More information about the Python-list mailing list