[Spambayes] Some ideas I have....

John Draper crunch@shopip.com
Sat, 28 Sep 2002 14:37:13 -0700


HI,

I want to start up another discussion about what the direction of the group is heading,  as far as addressing the issues of where spam filter should take place.   IE:  Client side,  Vs Server side.   We've all been through this before,  but I want a little more clarity the group is heading.  Just a quick summary of the issues I'm percieving...

Client side filtering allows for more individual customization.   Individualized Corpuses are needed,  but this then puts the bandwidth burdon on the indivisual.   They STILL have to pull down the spam from the POP server.   But this POP server can also have filtering taking place as well.   Is it my understanding that there is to be a more generalized type of filtering,  applicable to the general group of Email users sharing the same POP server?   Then having a more customizable filtering take place (either through processing scripts on the POP serer,   or on Client level code,  running as a POP Proxy,  or as some other localized stript needed for filtering.

Most of all this theorical statistical discussions are far above my head,  and the lack of my old College Staatistical textbooks where are probably outdated anyway,   I decided to take a less focus on these discussions,  and want to contribute by helping make our efforts more usable to the spam fighting community.

My experience in programming,  is more to throw something together to get a "proof of permance" system up and running.   My best talants are to "organize" different "Object" modules so they can be used for the largest range of applications as possible.

Let me throw in some really cool concepts I learned,  which i feel sort of "attached" to.

I had an opportunity to use Symantic's Think Class Liberty,  a collection or wuite of powerful C++ object that define computer functionality analygous to the way "People" work.   Let me give you an example...   Lets say we have a dialog box,  with numerous emdbedded and hierarchical "views".   One can think of the overall dialog as an "organization",   not onlike an "Accounting" department,  or "Development" department.  So,  in effect,  the Dialog would be a "supervisor" to raign over each of the elements inside.  But each of these elements can also be "supervisors" over things inside them.   NOT to be confused with "Inheritance" hierarchy,  there is a "Command" hierarchy.

The rule is simple...  each of the "Subordinate Bureaucrats" as I like to call them (name comes from Think Class Library),  never has to know anything about it's supervisor.   It is the supervisor's responsibility to manage and "inform" the collection of objects under it's "rule" or "control".

A "bureaucrat",  which is derived from "Object" has just two instance variables.   One is a pointer to it's supervisor,  and the other is a pointer to a List or Collection object,   also pointing to other bureaucrats in the organization.

If I subordinate does something,  that other subordinate objects have to "know about",   it sends a "DoCommand" message with an enumerated command number (Known ONLY to the supervisor,   would then delegate control up to the Supervisor,  which would then know what to do,  like notify all the other subordinate objects if appropriate.

I propose that I come up with a list of "Bureaucrats" as I like to call them,  not unlike what might be desired in a "mailroom".   Then define Spam fighting Bureaucrats that can work together on ANY machine,  and allow people to easily put together a host or Object Class Library of not just spam fighting "agents",  but also "mail handling" functions.

I'm making up a list now,  of bureaucrats I would want to see this have.   When I'm done with this list,  I would like to publish it to the group for discussion.

In meantime,   start sending me your nominations for "bureaucrats",  and I can start a list of them.   Start with what you've done alreay (in the way of Object classes that can be turned into bureacrats),  define their relationships,  then group them into "agencies" or collections of bureaucrats needed for whatever function appropriate.

I'll throw out some "teasers".   We need a "Configurator" bureaucrat,  that all it does is manage the server or client's "comfiguration".   Some people find configuration files convenient to manage their configurations,  others profer GUI's of local or web based types.   The Configurator would store it's objects in some persistant manner,  plus be able to read in a "configuration" file and setup that way.

I'm betting this is already being done,  but with different names.   So I'm proposing we collect these objects not unlike what a "human" organization would have to be,  to manage someone's mail.   Python already has these...  "Message",  and a few others.   We just make "Message" have an additional parent of a Bureaucrat.

A "Corpus manager" would manage a "corpus" and have all the smarts necessary to do that.

A "Tokenizer" would manage and control all the "Token" objects.

Another one would be Josiah Carlson's "PASP" which would derive from a "POP3" object (of course also deriving from Bureaucrat).

Anyway,  I ask for your comments on this approach to defining the "objects" needed to make all this useful.

John