[spambayes-dev] I took a big step Tuesday...
T. Alexander Popiel
popiel at wolfskeep.com
Thu Jul 24 10:22:38 EDT 2003
In message: <16159.64832.752182.993382 at montanaro.dyndns.org>
Skip Montanaro <skip at pobox.com> writes:
>After having used Spambayes since last September and scanning all messages
>marked as spam during that time, I made the decision a couple days ago to
>simply dump spam which scores 1.00 (or 1,00 if you've been following the
>recent locale saga). I mention it here to suggest that maybe it's
>worthwhile to consider creating finer-grained "spam" categories.
I think that there is some use to finer-grain categories, but
I'm not convinced of both (a) the score should be used for such
categorization, and (b) it needs to be done in spambayes.
>The step I took Tuesday was to simply dump mail which scores 1.00. That
>eliminates roughly 85% of the spam
I've adopted a slightly different rule: if both spambayes and
spamassassin agree that it's spam, then I toss it without looking
at it as the first step of my weekly spam management. This is
_not_ an automatic thing on delivery, so if I'm expecting something
that's likely to be spammy, I can save it, no matter how it scored.
Note that I don't run spamassassin myself, but several of my
upstreams do, and I get benefit thereby.
>I don't recall the last time I saw a false positive, and the place where
>mistakes are most likely to be made are in the lower scoring spams.
I've had a couple false positives recently, both receipts from
online purchases. They both scored very high by spambayes and
>Relating that to spambayes-dev subject matter, perhaps a "super-spam" cutoff
>could be created which would automatically delete messages which score at or
>above that value if the user's training set was "large enough". Thus, if
>they started training from scratch it would have no effect. By default, it
>would be set to something > 1.0 to prevent it from coming into play
>unexpectedly. I don't know what "large enough" is though.
I don't think that such extra gradation needs to be in the spambayes
code; the obvious super-spam at 1.00 is easily matched by MDAs already,
and more generic benefit is derived from using a completely different
method such as spamassassin to further categorize.
More information about the spambayes-dev