[spambayes-dev] Correcting training

Meyer, Tony T.A.Meyer at massey.ac.nz
Wed Sep 3 13:50:19 EDT 2003


> However, I don't think the filter has ever caught a 
> "Nigerian" spam or other "business proposal" sorts of 
> messages.

It did take me quite a lot of training to get these correctly
classified, and even now they sit at about 90% rather than the 100% that
just about everything else gets.  I imagine that just a few mistrained
would make it extremely difficult to correctly classify them.

> Sometimes I will copy 
> the source and go "Train as Spam" with it, but as far as I 
> know, that just means that the message has been trained as 
> both Ham and Spam.

That's correct.  However, if you retrain it from the 'review' page, then
it will be untrained and retrained (assuming that the messageinfo.db
file is all happy and running correctly), as spambayes will recognise
the id and take the correct action.

How do I get the message back on the review page once I've already
trained, you ask?  Use the "Find message" query, I answer :)  If you set
spambayes to add the id header, then you can look at the message's
headers, find the id, paste it into the box on the web interface, and it
will present that message in a 'review' page, which lets you retrain.

smtpproxy, assuming it's able to find the id, should also correctly
untrain/train, as should the (also-soon-to-be-renamed overkill.py
script, once it's complete).

> Would it be possible to add an "Untrain as Ham, Train as 
> Spam" function?

This sounds like a reasonable request - could you open a tracker on
sourceforge for it?  Given that we're only a day off a brief
feature-freeze, I can't see this happening before then.  It should be
easy enough to get into 1.1a1, though (is that how the version numbering
works?).  Otherwise it might get forgotten.  Feel free to assign it to
me if you like (anadelonbrin).

=Tony Meyer



More information about the spambayes-dev mailing list