[Spambayes] more testing: spam/ham proportions?

Anthony Baxter anthony@interlink.com.au
Tue, 24 Sep 2002 14:33:05 +1000



> > What's the best spam/ham proportion to go with on the current set of
> > things to be tested? As close to the real proportions as possible?
> > 1:1? 2:1? 10:1?
> My advice is get as close to real life as you can.  Every deviation just
> opens the door for a hundred gremlins to sneak in.  I can't say that I know
> of any specific gremlin here, but nobody has run controlled experiments to
> see whether or how much this matters, and in my view that means we don't
> know anything about it yet -- we can only guess.

Hm. I haven't kept all ham for the last couple of weeks, but I can certainly
grab a fair slab of it.

> You could do us a great service by picking one of the better schemes on the
> table,

Suggestions? I'm starting to feel a little dizzy at the number of options 
at the moment - what do you want tested? For that matter, what should I 
be feeding my mungo-set into? Courtesy of several hours of boredom waiting 
for my partner to come out of surgery, I _think_ I've now got the complete 
set of info@ekit.com email integrated and most of the badly filed stuff out 
of the way. There's 10 sets each of 1000 spam and 1950 ham. Unfortunately,
while the laptop has a 1.something gig cpu, it's only got 256M of RAM, so
running the big tests takes a fair while...

Anthony

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.