Filtering web proxy
erno at iki.fi
Mon Apr 17 23:41:13 CEST 2000
>>>>> "Oleg" == Oleg Broytmann <phd at phd.russ.ru> writes:
Oleg> Hello! I want a filtering web proxy. I can write one
Oleg> myself, but if there is a thing already... well, I don't
Oleg> want to reinvent the wheel. If there is such thing (free and
Oleg> opensourse, 'course), I'll extend it for my needs.
i am using junkbuster. it's simple, and works well.
turned on anyway. it can block cookies selectively (i have everything
but slashdot blocked), hide/spoof user-agent, and use other http
it doesn't do html parsing, and it's not written in python though.
but i haven't missed anything from it so it being written
off there's not so much need for html parsing...
Oleg> I wrote dozen HTML parsers in Python, so I can write one
Oleg> more, and turn it into a proxy, but may be I can start with
Oleg> some already debugged code?
a html parser would need to work incrementally, unless you want to
wait for the whole document to be transferred over the network before
seeing any of it rendered.
i guess you could do it incrementally with sgmllib (iirc you feed it a
file object?), but you run into the fact that a big part of
the html documents on the web are malformed and rely on the
error correcting heuristics of the major browsers to function...
one starting point could be the "gray proxy" (i forget what it was
really called). that was written on top of medusa, i think there was
an announcement here? probably a year or so ago. it parsed the html
and changed all the colors to grayscale, and did the same for
images. medusa isn't free though.. (except the version in zope?)
More information about the Python-list