[Spambayes] Outlook plugin - training

papaDoc papaDoc@videotron.ca
Mon Nov 11 13:03:40 2002


Hi,

Can someone define what is an hapaxe !

>Scores remain grossly hapax-driven, but that's actually enough to classify
>most of my email correctly:  a small number of subjects and senders and
>mailing lists overwhelmingly dominate my ham mix, and one email account
>accounts for the vast bulk of my spam.  Removing the hapaxes from the
>database dropped the # of words from 5500 to about 1700.  Rescoring the
>inbox with this reduced database then pushed about 5% of the msgs back into
>Unsure.
>
>So (no surprise here) hapaxes are vital with little training data.  That
>also means that as soon as one of those words shows up in the other kind of
>email, it changes from a strong clue to netural, *provided that* I actually
>train on the new email.  I'm not training now unless there's a
>mistake/unsure, so the hapaxes remain strong clues (even when they point in
>the wrong direction).  BTW, when there are mistakes/unsures, I'm not
>training on all of them:  as I did when I got up, I train the worst example
>then rescore, one at a time, until no mistakes/unsures remain.
>  
>

papaDoc

P.S. Someday I will contribute to the code but first I need to learn python.

>  
>




More information about the Spambayes mailing list