Instrumented web proxy

Andrew McLean andrew-news at
Thu Mar 27 16:25:55 EDT 2008

I would like to write a web (http) proxy which I can instrument to 
automatically extract information from certain web sites as I browse 
them. Specifically, I would want to process URLs that match a particular 
regexp. For those URLs I would have code that parsed the content and 
logged some of it.

Think of it as web scraping under manual control.

I found this list of Python web proxies

Tiny HTTP Proxy in Python looks promising as it's nominally simple (not 
many lines of code)

It does what it's supposed to, but I'm a bit at a loss as where to 
intercept the traffic. I suspect it should be quite straightforward, but 
I'm finding the code a bit opaque.

Any suggestions?


More information about the Python-list mailing list