[Spambayes] [ spambayes-Feature Requests-695059 ] wildcard
support for mboxtrain
T. Alexander Popiel
popiel at wolfskeep.com
Fri Mar 21 08:24:30 EST 2003
In message: <200303210933.12940.tdickenson at devmail.geminidataloggers.co.uk>
Toby Dickenson <tdickenson at devmail.geminidataloggers.co.uk> writes
>On Friday 21 March 2003 1:49 am, SourceForge.net wrote:
>It sounds like you are aiming the same direction as me.
>> it would be very helpful if mboxtrain would accept
>> wildcards for mail folder identification. [...]
>I am currently using a script that extracts all my mail folder names from a
>kmail configuration file, then builds up a long hammie command line and
>executes it. (Im happy to contribute this if anyone is interested)
>This is working well for me.
My approach to this problem is that I make two copies of every mail;
one copy goes into an 'everything' folder, and the other copy gets
delivered into 'inbox' or 'newspam' as appropriate. As I review spam
(or find it as false negatives), I move it into a 'spam' folder.
For training, ham = everything - spam - newspam. Naming three folders
doesn't seem to be a big deal, whereas naming all the innumerable
folders that my inbox gets sorted into would be.
The code to do this is checked in under contrib as bulktrain.sh and
bulkgraph.py, and described in BULK.txt.
>Every day I perform a full train unsing hammie, not mboxtrains incremental
>approach. This means I can use the mail reader to expire old messages, and
>have them removed from the spambayes database.
I just ignore everything more than 120 days old, personally... and
that's just to keep the database around 20 meg. Tests show that it
hurts accuracy by less than 1%. Of course, ignoring everything over
a week old hurts less than 5%...
More information about the Spambayes