[Spambayes] Obvious Spam Missed...
rmalayter at bai.org
Tue Sep 16 17:12:15 EDT 2003
From: Skip Montanaro [mailto:skip at pobox.com]
> ("yup, that's ham <check>. yup, that's ham <check>....),
> declaring all the mailboxes which are full of ham
> to the Outlook plugin. Most people don't save the
> spam they receive, so doing that would cause them to
> immediately start off with a horrible imbalance.
> To make matters worse, if much of the ham saved is
> very old, it may not accurately reflect what current
> ham looks like.
Which brings us back to the "automatic subset" feature Tim, myself, and
others were talking about in the SpamBayes "feature requests" forum.
Like most people, I organize my messages into a lot of folders, and I
want to train on messages from each of those folders, so that my
personal ham, mailing list ham, and work-related ham are all
represented. There's no easy way to do get a representative subset with
the current Outlook plug-in, it would have to be done by hand, and that
would take quite a while.
The algoritm I suggested over in the feature requests forum would solve
both the age and "representative sample" problems neatly. Perhaps I
should just make a script (in VBA I guess) that creates the
representative sample automatically, and then I can train with it and we
can post it as an add-on or something. If people like it, it can be
integrated into the plug-in installer.
More information about the Spambayes