VERY basic browser question

Mon Feb 24 14:46:30 EST 2003

<cribeiro at mail.inet.com.br> wrote in message news:<mailman.1046026804.9881.python-list at python.org>...
> 2) The 'automation interface' approach. Most programs today have
> automation interfaces, that can be used to remotely control the program in
> much more advanced ways than possible using the technique described above.
> On the other hand, the use of the automation interface requires much more
> knowledge about the internal structure, as exposed in the automation API,
> of the application being controlled.
> 
> Most web browsers today do have automation interfaces. However, I don't
> know how difficult it is to solve your particular problem using this
> approach. You say that you have to fill in a form; to do it with
> automation, it would require that the 'form' object to be exposed, and I
> don't know if it is possible (that's my opinion: it should be possible,
> but probably it's not easy).
> 
> In short, this is a very elegant and extensible approach, but I think that
> it probably requires a lot of knowledge of the automation APIs. I may be
> wrong, though.

I can speak to using this approach with Internet Explorer on Windows.
It really doesn't require much knowledge of the automation API. Once
you have an InternetExplorer object, you can access the DOM of the
document and can script it and the browser in Python. There are a few
odd corners, but otherwise it's quite easy to do.

Here's an example:

# google_ie.py
# launch Internet Explorer, visit google.com, search for a keyword,
# then display each matching document for a short period of time.

SEARCH_PHRASE = 'applesauce'

from win32com.client import Dispatch
import time

# this defines a "helper" function we'll use below.
def wait(ie):
    "Given an IE object, wait until the object is ready for input."
    while ie.Busy:
        time.sleep(.1)

# create the browser window, make it visible
ie = Dispatch('InternetExplorer.Application')
ie.Visible = 1
ie.navigate('http://www.google.com/')
wait(ie)

# here we depend on foreknowledge of the page's design.
# it has a single form, the first element of the form is the
# place where you type in the keywords to look for.
form = ie.Document.forms[0]
search_for = form.elements[0]
search_for.value = SEARCH_PHRASE
form.submit()

wait(ie)

# now we have our search results, harvest the references
hrefs = []
for link in ie.Document.links:
    if link is None:
        # this is important. perhaps it's a bug in IE, I'm not sure,
        # but if link==None, then we're at the end of the list.
        # without the break, you would have an infinite loop.
        break

    # we only want links that don't refer back to Google servers
    if link.href.find('google') == -1:
        # print the text of the link, and the URL
        print link.innerText
        print '\t', link.href
        hrefs.append(link.href)

# now flip through the references, showing each for one second
for href in hrefs:
    # the try...except block helps in case there was a problem with
    # the page you're trying to load
    try:
        ie.navigate(href)
        wait(ie)
    except:
        print 'could not load', href

    # display the page for one second...
    time.sleep(1)

# we're done, let the user know.
# load a blank page
ie.navigate('about:blank')
# write to it.
ie.Document.write("<h1>that's all, folks!</h1>")

# another way to do this:
#   ie.Document.innerHTML = "<html><body>that's all</body></html>"

I use this approach for testing Web applications. If you have a
computer lab at your disposal, it's not hard to set up a simple
master/slave (or subscribe/publish) system to have all the computers
run a similar script, and hit your site at once. A cheap but effective
load testing approach.

-- Graham