Instrumented web proxy
Andrew McLean
andrew-news at andros.org.uk
Thu Mar 27 16:25:55 EDT 2008
I would like to write a web (http) proxy which I can instrument to
automatically extract information from certain web sites as I browse
them. Specifically, I would want to process URLs that match a particular
regexp. For those URLs I would have code that parsed the content and
logged some of it.
Think of it as web scraping under manual control.
I found this list of Python web proxies
http://www.xhaus.com/alan/python/proxies.html
Tiny HTTP Proxy in Python looks promising as it's nominally simple (not
many lines of code)
http://www.okisoft.co.jp/esc/python/proxy/
It does what it's supposed to, but I'm a bit at a loss as where to
intercept the traffic. I suspect it should be quite straightforward, but
I'm finding the code a bit opaque.
Any suggestions?
Andrew
More information about the Python-list
mailing list