[Spambayes] training

Meyer, Tony T.A.Meyer at massey.ac.nz
Wed Feb 19 18:01:42 EST 2003

What about this as a method:

1. POP3proxy adds another header to mail: 'X-Spambayes-ID: XXX' - the id is the same id as in the corpus caches (i.e. at the moment, this is the receiver time and a 'uniquifier').

2. For manual training, this ID can be entered into the web ui to allow reviewing of that message.  (The existing manual system also stays in place).

3. Incorrect mail is forwarded to spambayes_spam at localhost or spambayes_ham at localhost.  The SMTP proxy examines mail to either of those addresses (and stops it going further).  It checks for:
(a) an attached message
(b) the words "X-Spambayes-ID:" in the body
And extracts the correct id, and uses this to find the message in the corpus cache, and then does the appropriate training.

I've done 1 & 2, which at least solves the problem of finding a message to train in a huge cache.  The SMTP proxy was at least partially done, which just leaves the searching (not that difficult) and the training hooks.

If a mail application failed to include the headers in either an attached message or in the body (say it strips them), then there could always be an option to include the id in the message body (as ugly and intrusive as that is).  This would work with Outlook Express and Eudora at least (I don't have anything else to test).

What do you think?

If someone still has it, could they send me the SMTP proxy prototype code?

=Tony Meyer

More information about the Spambayes mailing list