[Spambayes] Outlook plugin - training
Anthony Baxter
anthony@interlink.com.au
Wed Nov 6 22:19:40 2002
>>> Tim Peters wrote
> Automatic training needs lots of work. The Outlook client has gotten
> smarter than anything else about this so far, but at the moment it's
> basically automating "mistake based" training, which I think will prove to
> be a Bad Idea over time.
>
> Ideal is to train regularly on a random sample of all msgs, whether or not
> correctly classified (I fake this by hand for now). That presents some UI
> and algorithmic challenges.
Note that "random sample" is not as trivial as all that, either - if
you have a very high ham:spam ratio in your training DB, your accuracy
will suffer (see the tests from Alex, myself and others).
An easy example of this is those of us who are on a bunch of higher
volume python.org lists - Greg's sterling work there means that very
little spam gets through there.
As spambayes takes over the world, this could be a larger problem.
Anthony
--
Anthony Baxter <anthony@interlink.com.au>
It's never too late to have a happy childhood.
More information about the Spambayes
mailing list