[spambayes-dev] Idea for multi-user spambayes
tim.peters at gmail.com
Sun Nov 27 21:23:18 CET 2005
> I have an idea for a spambayes variation that should be more suited
> to multi-user systems. The goal is to make the DB somewhat
> conditionalized based on recipient address. In addition to storing
> <token>, spambayes could also save (<recipient>, <token>). When
> scoring a message, the probability for (<recipient>, <token>) would
> be added to the evidence as well as for <token>.
Offhand I think it would make more sense to ignore <token> when a
(<recipient>, <token>) pair (for the same <token> and the given
<recipient>) is known. For example, if a urologist trains on "penis"
as ham, it's not doing him a favor to fold in that it's spam to almost
> I'm looking at chi2_spamprob() and wondering if this is valid,
There's really no sense in which chi2_spamprob() computes "a
probability" -- it works or it doesn't. Heh.
> Is there some better way to include the (<recipient>, <token>) evidence?
Test some ;-)
> BTW, if this idea actually works, using (<sender>, <token>) may also
> be helpful.
Spam sender addresses typically change rapidly, while ham sender
addresses typically don't. So I expect this would add major boosts to
the tokens sent by ham senders, and typically create a ton of hapaxes
from spam senders (due to the spam <sender> addresses constantly
More information about the spambayes-dev