[Tutor] How to log on to web site using Python?

Branimir Petrovic BranimirP@cpas.com
Wed Jan 1 19:32:04 2003


"""I'd really appreciate if someone could help me get this 'exercise'
straightened....

I am trying to produce Python script to monitor 'health' of one internal
web site. Idea is to periodically request a page and time how long it
takes to get it. Should it become unacceptably long, I might want to
learn about it (via automated mailing), or I might even dare to empower
the script to just re-boot the server.

Site in question is running Java Servlets (via ServletExec) on IIS 5.
For variety of reasons, this unhappy combo (Servlets on IIS) is here to
stay, therefore I must find a way to automate logging on to figure out
if the site is 'still there', or it needs life restoring 'reboot slap'.

Normally - site requires authentication and needs browser with cookies
enabled. May be my problem has to do with cookies, may be not...

Here is the critical part:
"""

import urllib, time

logonPage = 'http://myURL/servlet/Logon'
validUser = 'http://myURL/servlet/FindUser'

def urlOpener(url, data=None):
	if isinstance(data, dict):
		data = urllib.urlencode(data)
	
	urlObj = urllib.urlopen(url, data)
	return urlObj.read()

# This particular web page does not require authentication
# but contains log on form:
tStart = time.time()
page = urlOpener(logonPage)
tStop = time.time()
print 'Tdld = %4.2f Sec\n\n' % (tStop - tStart)
print page

# Web page is fetched (as it should) and printed output looks somewhat like:
"""
Tdld = 0.24 Sec


<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>Logon</title>
...
<body>
<form name="logon_form" method="POST" action="http://myURL/servlet/FindUser"

onSubmit = "return validateLogon(this)">
...
<input type="text" name="pin" value="" + size="20" onFocus='this.select();'>
<input type="password" name="password" size="20" onFocus='this.select();'>

...
</body>
</html>
"""


# Now I'll try to POST log on info and this is the problematic part: 
tStart = time.time()
page = urlOpener(validUser, {'pin': 'myUsrID', 'password': 'myPWD'})
tStop = time.time()
print 'Tdld = %4.2f Sec\n\n' % (tStop - tStart)
print page

# Printed output always - regardless of user credential used - looks like:
"""
Tdld = 0.66 Sec


<html>
<head><title>Base</title></head>
<body>
</body></html>
"""


# Should I try accessing the page *without* posting anything like:
tStart = time.time()
page = urlOpener(validUser)
tStop = time.time()
print 'Tdld = %4.2f Sec\n\n' % (tStop - tStart)
print page

# the output will be:
"""
Tdld = 0.11 Sec


<html>
<head><title>CQ_Base</title></head>
<body>
</body></html>
"""


"""
Much shorter Tdld in this case indicates log on faliure, but I'd
much rather get the actual page, parse it and based on this draw
my own conclusions if the log on attempt was successfull or not...

Why is fetched page basically empty even when proper user ID
and password combo is used? Aparently I have it all wrong, but
what are my mistakes?

Thanks,

Branimir
"""