Recording web page URLs
dave at pythonapocrypha.com
Fri Dec 12 18:51:20 CET 2003
> Hello everyone,
> I'm building a python application that will monitor a person's behavior with
his mouse while surfing the web (probably
> using Internet Explorer). I want to record the URLS of the pages he visits.
Is there a simple way to do this in python?
This is really a two-part question:
1) Is there a simple way, on the Windows platform, to capture URLs visited?
2) How do I do this from Python?
Unless someone here has specific experience solving #1, there will probably be
other websites/newsgroups more knowledgeable in answering that question (but
people here might still be able to help), but once you know the answer to #1
you will probably find lots of help here in answering #2. :)
Here's my attempt at answering #1, but remember that I'm no Windows expert so
there may be much better ways of doing this:
a- configure the browser to access the internet through a local proxy
b- use something like winpcap + urlsnarf to do packet & url sniffing
c- implement an IE BrowserHelper object or use some other COM interface
d- browse from inside an application that hosts an Internet Explorer object
e- some win32API hackery wherein you figure out the location of the mouse when
the click occurred, grab a handle to the appropriate window & the browser DOM,
and do magic to walk the DOM and deduce what link was clicked (ok, I'm
straining at ideas now, but you did say you were monitoring mouse behavior so
As far as question #2, all are possible from Python if they're possible in
Windows, although option 'e' is almost too bizarre. Specifically, I've used
options a, b, and d:
a - there are several open source proxies written in Python, so this one might
be easy. This works well as long as the proxy is written to gracefully handle
bad data. The HTTP spec is complicated and there are oodles of HTTP servers
that implement the spec in quirky ways.
b - this one is more tedious to implement and raises some security questions,
but is transparent to the user once installation has occurred. Unlike the proxy
approach, a failure in your code shouldn't ever interrupt the user's browsing,
which is nice. At home my Linux router runs urlsnarf and hands the data off to
a little Python app that logs URLs. At the end of the week everyone in my
family gets an email outlining sites visited from my computer, my wife's
computer, and my kids' computer - sort of a self-monitoring thing to make sure
we all behave. I'm in the process of moving it all to Windows, and it's
definitely doable there.
c - doable via ctypes or win32com, but I've never tried it. I know
BrowserHelper objects let you integrate with IE to some extent. You need to
find the same hook or API that download managers use when they "hijack" a
download from a browser to handle it themselves.
d - trivial (download wxPython, embed an IE instance, and capture the
OnBeforeNavigate events IIRC) but really weird for the user, and probably not
very useful except in some very controlled research project. :)
If you do some digging around on some more general purpose Windows programming
websites or lists and find something interesting, please let me know. :)
More information about the Python-list