Instrumented web proxy

Andrew McLean andrew-news at andros.org.uk
Thu Mar 27 16:25:55 EDT 2008


I would like to write a web (http) proxy which I can instrument to 
automatically extract information from certain web sites as I browse 
them. Specifically, I would want to process URLs that match a particular 
regexp. For those URLs I would have code that parsed the content and 
logged some of it.

Think of it as web scraping under manual control.

I found this list of Python web proxies

http://www.xhaus.com/alan/python/proxies.html

Tiny HTTP Proxy in Python looks promising as it's nominally simple (not 
many lines of code)

http://www.okisoft.co.jp/esc/python/proxy/

It does what it's supposed to, but I'm a bit at a loss as where to 
intercept the traffic. I suspect it should be quite straightforward, but 
I'm finding the code a bit opaque.

Any suggestions?

Andrew



More information about the Python-list mailing list