[Spambayes] training

Tim Stone - Four Stones Expressions tim at fourstonesExpressions.com
Wed Feb 19 07:11:15 EST 2003

2/18/2003 11:01:42 PM, "Meyer, Tony" <T.A.Meyer at massey.ac.nz> wrote:

>What about this as a method:
>1. POP3proxy adds another header to mail: 'X-Spambayes-ID: XXX' - the id is the same id as in the corpus caches (i.e. at the moment, this is the receiver 
time and a 'uniquifier').
>2. For manual training, this ID can be entered into the web ui to allow reviewing of that message.  (The existing manual system also stays in place).
>3. Incorrect mail is forwarded to spambayes_spam at localhost or spambayes_ham at localhost.  The SMTP proxy examines mail to either of those 
addresses (and stops it going further).  It checks for:
>(a) an attached message
>(b) the words "X-Spambayes-ID:" in the body
>And extracts the correct id, and uses this to find the message in the corpus cache, and then does the appropriate training.

The problem here is that some mailers pretty much lose most of the headers when you do a forward operation...  Placing something like a url in the body of 
a message is another possibility that's been raised.  It's somewhat dangerous, particularly in the case of multipart messages, and for html messages may not 
be visible at all.  SpamAssassin modifies the subject for exactly these reasons.  It's the one header that can pretty much be guaranteed to be there when 
you need it and be testable with most any mailer filtering mechanism.  But you can't put a url in it, and putting an id that some user has to cut and paste, 
while better than nothing, doesn't really make life much easier for J. Q. Public.  - TimS

>I've done 1 & 2, which at least solves the problem of finding a message to train in a huge cache.  The SMTP proxy was at least partially done, which just 
leaves the searching (not that difficult) and the training hooks.
>If a mail application failed to include the headers in either an attached message or in the body (say it strips them), then there could always be an option to 
include the id in the message body (as ugly and intrusive as that is).  This would work with Outlook Express and Eudora at least (I don't have anything else 
to test).
>What do you think?
>If someone still has it, could they send me the SMTP proxy prototype code?

Hmmmm.... good question, I should have it somewhere (I wrote it).  But it's not integrated with pop3proxy, and so database updates from either clobber the 
other.  It really needs to be a single process, and Richie was going to do that until we told him we didn't see any particular value to the work.  If we cannot 
guarantee that the header we need will be there with all mailers, then we either have to change the mechanism, or begin to account for different mailers, 
which would be really awful (of course).

>=Tony Meyer
>Spambayes mailing list
>Spambayes at python.org

c'est moi - TimS

More information about the Spambayes mailing list