[Spambayes] Obvious Spam Missed...

Tim Peters tim.one at comcast.net
Tue Sep 16 21:48:21 EDT 2003


[Peter Beckman]
> What about including a "starter" HammieDB, something that changes over
> time, based on a bunch of different people's settings.
>
> By starting with a nice ratio, hopefully most of the spam will go to
> the right place, and in the beginning, a lot of ham might be
> classified as "unsure", but that would happen anyway.
>
> At least most of the current "spam" would get marked as such.
>
> I know, email is a very personal thing and can't be "generalized" but
> I wonder if this would help in some cases where the people were less
> than aware of the ham/spam training stuff.

InBoxer <http://www.inboxer.com/> includes a starter database.  I think they
put a lot of work into that, which doesn't mean we can't, but does mean it
requires someone who believes in it enough to devote their free time to
doing that work.

One thing to watch out for is that if you put "too much" data into the
starter database, more additional training is needed to cater to personal
quirks than if a new user starts with an empty database.

There's also a real danger of systematically ruining the classifier for some
classes of users.  For example, one of the co-founders of the Motley Fool
posted last week, saying the folks who work there are happy with spambayes.
But any reasonably large spam collection is going to contain a ton of
stock-pumping scam spam, and that can poison the database for people who
work with stocks for a living ("Dow" appears in 500 spam and no ham at the
start ... OK, now it appears in 1 ham ..., OK, now in 2 ham, ... overcoming
the initial bias can be a bitch).

Not to mention those of us who moonlight as porn stars <wink>.




More information about the Spambayes mailing list