[Spambayes] Outlook plugin - training

Anthony Baxter anthony@interlink.com.au
Wed Nov 6 22:19:40 2002


>>> Tim Peters wrote
> Automatic training needs lots of work.  The Outlook client has gotten
> smarter than anything else about this so far, but at the moment it's
> basically automating "mistake based" training, which I think will prove to
> be a Bad Idea over time.
> 
> Ideal is to train regularly on a random sample of all msgs, whether or not
> correctly classified (I fake this by hand for now).  That presents some UI
> and algorithmic challenges.

Note that "random sample" is not as trivial as all that, either - if
you have a very high ham:spam ratio in your training DB, your accuracy
will suffer (see the tests from Alex, myself and others). 

An easy example of this is those of us who are on a bunch of higher
volume python.org lists - Greg's sterling work there means that very
little spam gets through there. 

As spambayes takes over the world, this could be a larger problem.

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.




More information about the Spambayes mailing list