[Spambayes] training WAS: aging information

T. Alexander Popiel popiel at wolfskeep.com
Wed Feb 19 15:28:30 EST 2003


In message:  <1ED4ECF91CDED24C8D012BCF2B034F1318CD64 at its-xchg4.massey.ac.nz>
             "Meyer, Tony" <T.A.Meyer at massey.ac.nz> writes:
>[Alex]
>> I was the one who did the bulk of the ratio experiments, and I
>> posted my results at http://www.wolfskeep.com/~popiel/spambayes.
>>
>> It would be worthwhile to rerun similar experiments with current
>> versions of the code, too.
>
>Thanks for (re)posting this link, certainly interesting reading.  Were
>these done before or after the experimental_ham_spam_in_balance code?

Before; I like to think that my results were in part responsible for
getting that option added.

>What I would like to know (and I suspect others) is whether this means
>that say I have in my stored mail a ham:spam ratio of 300:3000.  Should
>I randomly chose 300 ham and have a 300:300 ratio?  Or is giving up the
>information in the other 2700 messages a bad thing?

Well, as long as the 300 ham chosen are actually representative of
the types of ham you get, I don't see any harm in only using 300.
I don't have the math or the experimental results to back that up,
though.

>If someone was willing to do some more tests with the most recent code,
>I think lots of people would be interested.

I'm trying to, but life keeps interfering.

- Alex



More information about the Spambayes mailing list