Best architecture for proxy?

Wed Jul 11 10:20:46 EDT 2007

On Wed, 11 Jul 2007 07:00:18 -0700, Andrew Warkentin <andreww at datanet.ab.ca> wrote:
>On Jul 10, 8:19 pm, Steve Holden <s... at holdenweb.com> wrote:
>> Bjoern Schliessmann wrote:
>> > Andrew Warkentin wrote:
>>
>> >> I am going to write a general-purpose modular proxy in Python. It
>> >> will consist of a simple core and several modules for things like
>> >> filtering and caching. I am not sure whether it is better to use
>> >> multithreading, or to use an event-driven networking library like
>> >> Twisted or Medusa/ Asyncore. Which would be the better
>> >> architecture to use?
>>
>> > I'd definitely use an event-driven approach with Twisted.
>>
>> > Generally, multithreading is less performant than multiplexing. High
>> > performance servers mostly use a combination of both, though.
>>
>> Converselt I'd recommend Medusa - not necessarily because it's "better",
>> but becuase I know it better. There's also a nice general-purpose proxy
>> program (though I'd be surprised if Twisted didn't also have one).
>>
>>
>Would an event-driven proxy be able to handle multiple connections
>with large numbers of possibly CPU-bound filters? I use The
>Proxomitron (and would like to write my own proxy that can use the
>same filter sets, but follows the Unix philosophy) and some of the
>filters appear to be CPU-bound, because they cause The Proxomitron to
>hog the CPU (although that might just be a Proxomitron design flaw or
>something). Wouldn't CPU-bound filters only allow one connection to be
>filtered at a time? On the Medusa site, it said that an event-driven
>architecture only works for I/O-bound programs.
>

Handling all of your network traffic with a single OS thread doesn't
necessarily mean that all of your filters need to run in the same
thread (or even in the same process, or on the same computer).

Typically, however, a filtering rule should only need to operate on a
small number of bytes (almost always only a few kilobytes).  Is it the
case that handling even this amount of data incurs a significant CPU
cost?  If not, then there's probably nothing to worry about here, and
you can do everything in a single thread.  If it is the case, then you
might want to keep around a thread pool (or process pool, or cluster)
and push the filtering work to it, reserving the IO thread strictly for
IO.  This is still a win, since you end up with a constant number of
processes vying for CPU time (and you can tune this to an ideal value
given your available hardware), rather than one per connection.  This
translates directly into reduced context switch overhead.

Jean-Paul