Third result ... RE: [Spambayes] First result from Gary Robinson's ideas

Anthony Baxter anthony@interlink.com.au
Fri, 20 Sep 2002 00:43:35 +1000


>>> Anthony Baxter wrote
> total unique fp went from 33 to 38 lost   +15.15%
> mean fp % went from 0.366932315011 to 0.42251875192 lost   +15.15%

> total unique fn went from 55 to 47 won    -14.55%
> mean fn % went from 0.710811726238 to 0.60736899408 won    -14.55%

In private conversation with Gary, he suggested I post a reminder about
my particular test corpus. It's considerably more brutal a test case, as
it's 3 and a bit years worth of email to the contact email addresses for 
the company I work at. So far it's got 9,000 ham and 7,500 spam, there's 
another 12,000 messages that remain to be categorised. 

This means we get email from customers complaining about credit card charges, 
people forwarding spam messages to complain, genuine sales enquiries or 
reseller enquiries, all sorts of things that drive spamassassin mental. 

It turns out that Gary's approach is sensitive to small numbers of very
spammy words (his description) and that this is why I'm getting worse
f-p results...

[gary - sorry for the false alarm - it's actually something with the mail
setup on my laptop that's causing email directly from me to you to bounce.
working on it now...]

Anthony