[spambayes-dev] Another piece of anecdotal evidence

Eli Stevens (WG.c) listsub at wickedgrey.com
Wed Jan 14 14:17:55 EST 2004


Skip Montanaro wrote:

>     Alex> Total:    4694 ham, 39913 spam (89.48% spam)
>     Alex> Trained:   204 ham, 10994 spam (98.18% spam)
> 
>     Alex> Having such a high imbalance does seem to make me particularly
>     Alex> susceptible to training errors... but doesn't seem to hurt
>     Alex> otherwise.


Does it hurt more when a FP or FN is mistrained?


> How do you plan to find those mistrained messages?


Hmm...  How feasible is:

trainEverything()

for msg in hamCorpus:
     untrain( msg )
     result = classify( msg )
     if result == spam:
         display( msg )
     train( msg )

This won't work if the mistrained messages are not very spammy, but in 
that case they shouldn't be affecting classification adversely, right?

Eli




More information about the spambayes-dev mailing list