Instrumented web proxy

Andrew McLean andrew-news at andros.org.uk
Fri Mar 28 14:11:56 EDT 2008


Paul Rubin wrote:
> Andrew McLean <andrew-news at andros.org.uk> writes:
>> I would like to write a web (http) proxy which I can instrument to
>> automatically extract information from certain web sites as I browse
>> them. Specifically, I would want to process URLs that match a
>> particular regexp. For those URLs I would have code that parsed the
>> content and logged some of it.
>>
>> Think of it as web scraping under manual control.
> 
> I've used Proxy 3 for this, a very cool program with powerful
> capabilities for on the fly html rewriting.
> 
> http://theory.stanford.edu/~amitp/proxy.html

This looks very useful. Unfortunately I can't seem to get it to run 
under Windows (specifically Vista) using Python 1.5.2, 2.2.3 or 2.5.2. 
I'll try Linux if I get a chance.




More information about the Python-list mailing list