[Spambayes] training WAS: aging information

Wed Feb 19 09:53:48 EST 2003

In message:  <16E1010E4581B049ABC51D4975CEDB880113D91A at UKDCX001.uk.int.atosorig
in.com>
             "Moore, Paul" <Paul.Moore at atosorigin.com> writes:
>From: D. R. Evans [mailto:N7DR at arrisi.com]
>> I saw a comment in the LJ article that one should train on roughly
>> equal numbers of spam and ham. Is this actually true? (This question
>> of course merely demonstrates that I'm too lazy to do the maths myself.)
>
>That's something I'd be interested in, too - particularly as the
>ham:spam ratio people get is utterly out of their control. I'm also
>too lazy - or possibly incompetent - to do the maths, but IIRC, there
>were some experiments done at one stage. A pointer to the relevant posts
>(or better still, a summary on the website) would be very useful.

I was the one who did the bulk of the ratio experiments, and I
posted my results at http://www.wolfskeep.com/~popiel/spambayes.

One thing to note about the experiments: in them, I varied not only
the ratios of the training set, but also the ratios of the testing
set.  This is not particularly realistic for gauging the effect of
mangling the ratio of training for some particular person's live
feed (where the testing ratio would remain constant).

It would be worthwhile to rerun similar experiments with current
versions of the code, too.

- Alex