Mimic a web surfer ?
John Hunter
jdhunter at ace.bsd.uchicago.edu
Tue Apr 1 09:28:03 EST 2003
>>>>> "shagshag13" == shagshag13 <shagshag13 at yahoo.fr> writes:
shagshag13> hello, i would like to develop a script which will
shagshag13> mimic a web surfer, that is :
shagshag13> - follow links as a human and thus always come from
shagshag13> previous page (i think there is something to do with
shagshag13> http 'referer' and cookies) - be able to log if
shagshag13> necessary
shagshag13> is there anything already existing ? i'm looking for
shagshag13> any tips which will help me to start...
shagshag13> note : this could be used to have "automatic" access
shagshag13> to free webmails not providing pop protocol.
Yes, this is possible. As you say, you generally need to set referer
and or cookies. This can be done with httplib. I often find the best
way to do this is to install a sniffer which monitors traffic on port
80. If you are on linux, you may want to try sniffit. On windows, I
have had luck with proxomitron. With one of these sniffers running,
browse the site you want to mimic and capture all the traffic. Pore
through the logs and figure out how the site wants you to handle
cookies and referers, and then whip up your python version. Here is
an example script using httplib that sets referer and cookie headers
import httplib
import socket
host = 'slashdot.org'
pathn = '/science/01/12/03/1630212.shtml'
try:
h = httplib.HTTP(host)
h.putrequest('GET', pathn)
h.putheader('Accept', 'text/html')
h.putheader('Accept', 'text/plain')
h.putheader('User-Agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20010913')
h.putheader('From', 'marshall at migley.zko.dec.com')
h.putheader('Cookie', 'mycookieval')
h.putrequest('Referer', 'http://myserver.com')
h.endheaders()
errcode, errmsg, headers = h.getreply()
except socket.error, er:
print 'socket error ', er
print errcode
hkeys = headers.keys()
#for key in hkeys:
# print key, ' = ', headers.getrawheader(key)
John Hunter
More information about the Python-list
mailing list