[Spambayes] Mail classifiers, training sets and technical docs

Anthony Baxter anthony at interlink.com.au
Mon Dec 30 18:24:57 EST 2002


>>> Tim Peters wrote
> There are many public spam archives available, so no on that count.  It
> works better if people use their own spam anyway (for example, that's the
> only way to pick up header clues unique to their ISP).  We can't supply a
> large training set of ham because what constitutes ham is specific to the
> user.  It "would be nice" to seed a database with some set of msgs everyone
> would agree are ham, but that's surprisingly difficult to arrive at.

A thought that occurs to me now - would it make more sense to instead 
provide a database seeded with a few obvious clues, rather than whole
messages - for instance, start with a bunch of the standard "really 
really really bogus spam clues" from spamassassin?

That way, people will hopefully start to get results immediately...

Bah, brain foggy from too much Christmas, probably making no sense at
all.



-- 
Anthony Baxter     <anthony at interlink.com.au>   
It's never too late to have a happy childhood.




More information about the Spambayes mailing list