[spambayes-dev] Mozilla SpamBayes "porting"

Thu Feb 19 11:49:24 EST 2004

>>The corpus I used has 2,793 hams and 948 hams.
> 
> 
> I presume one of those is spam, or you've got amazing results <wink>.  If
> your testing setup allows it, have a go with 948 of each, and see what that
> does.
That was 948 spams.  I'll do some tests with equal number of spams and hams.

> If you've got the corpus lying around in one-text-file-per-email format,
> then the easiest way to test would be to install Python and SpamBayes, and
> run timcv.py over the corpus and see if the results you get look something
> like the Mozilla ones (I suppose you could do -n2 to simulate splitting the
> corpus in half, rather than more sets as is common here).
The problem is that the tokenizers are different, so it's not possible to 
compare the results since the classifiers are fed different tokens.

>>It's kind of tough to test since we don't have the nice 
>>cross-validation tools that you have
> 
> 
> You could write some, of course <wink>.
I suppose I could, or I could let you guys do the testing and then copy your 
results into Mozilla ;-)