[Spambayes] Obvious Spam Missed...

Ryan Malayter rmalayter at bai.org
Tue Sep 16 17:12:15 EDT 2003

From: Skip Montanaro [mailto:skip at pobox.com] 
> ("yup, that's ham <check>. yup, that's ham <check>....),
> declaring all the mailboxes which are full of ham
> to the Outlook plugin.  Most people don't save the 
> spam they receive, so doing that would cause them to 
> immediately start off with a horrible imbalance.

> To make matters worse, if much of the ham saved is 
> very old, it may not accurately reflect what current 
> ham looks like.

Which brings us back to the "automatic subset" feature Tim, myself, and
others were talking about in the SpamBayes "feature requests" forum. 

Like most people, I organize my messages into a lot of folders, and I
want to train on messages from each of those folders, so that my
personal ham, mailing list ham, and work-related ham are all
represented. There's no easy way to do get a representative subset with
the current Outlook plug-in, it would have to be done by hand, and that
would take quite a while. 

The algoritm I suggested over in the feature requests forum would solve
both the age and "representative sample" problems neatly. Perhaps I
should just make a script (in VBA I guess) that creates the
representative sample automatically, and then I can train with it and we
can post it as an add-on or something. If people like it, it can be
integrated into the plug-in installer.


