[Spambayes] Outlook plugin - training
papaDoc
papaDoc@videotron.ca
Mon Nov 11 13:03:40 2002
Hi,
Can someone define what is an hapaxe !
>Scores remain grossly hapax-driven, but that's actually enough to classify
>most of my email correctly: a small number of subjects and senders and
>mailing lists overwhelmingly dominate my ham mix, and one email account
>accounts for the vast bulk of my spam. Removing the hapaxes from the
>database dropped the # of words from 5500 to about 1700. Rescoring the
>inbox with this reduced database then pushed about 5% of the msgs back into
>Unsure.
>
>So (no surprise here) hapaxes are vital with little training data. That
>also means that as soon as one of those words shows up in the other kind of
>email, it changes from a strong clue to netural, *provided that* I actually
>train on the new email. I'm not training now unless there's a
>mistake/unsure, so the hapaxes remain strong clues (even when they point in
>the wrong direction). BTW, when there are mistakes/unsures, I'm not
>training on all of them: as I did when I got up, I train the worst example
>then rescore, one at a time, until no mistakes/unsures remain.
>
>
papaDoc
P.S. Someday I will contribute to the code but first I need to learn python.
>
>
More information about the Spambayes
mailing list