[spambayes-dev] Spam and ham count can be negative !

papaDoc papaDoc at videotron.ca
Fri Sep 19 08:28:57 EDT 2003


Hi,


I'm trying to patch sb_mboxtrain to train on only the email not more
than x days old.  I want to do that since I'm keeping all my email and
if I train on all my emails the db becomes too big.

When I try my new sb_mboxtrain I got the following message (or
something similar) "The spam count will be negative."


Ok let see if this is possible.

I use sb_mboxtrain to create my database with only 10 spams and 10
hams (This is a small amount since I want some error for the purpose
of the demonstration)

I'm using hammie to filter my mail for several days. Since it is making some
errors I move some emails from the spam folder to the ham and vice-verca...

Every night I'm using sb_mboxtrain on my ham and spam folder.
If you look at the code below. (Original part of sb_mboxtrain)
You can see that if it exist an Header for the message (Ex hammie has created
the spam header) then it is untrain before retraining has the right ham or spam
The problem is I NEVER TRAINED the database on this email.
So the spam or ham count can becomes negative !!!!

The second problem I don't known how to solve this problem.....

   if is_spam:
        spamtxt = options["Headers", "header_spam_string"]
    else:
        spamtxt = options["Headers", "header_ham_string"]
    oldtxt = msg.get(TRAINED_HDR)
    if force:
        # Train no matter what.
        if oldtxt != None:
            del msg[TRAINED_HDR]
    elif oldtxt == spamtxt:
        # Skip this one, we've already trained with it.
        return False
    elif oldtxt != None:
        # It's been trained, but as something else.  Untrain.
        del msg[TRAINED_HDR]
        h.untrain(msg, not is_spam)
    h.train(msg, is_spam)
    msg.add_header(TRAINED_HDR, spamtxt)

    return True


Remi
papaDoc at videotron.ca





More information about the spambayes-dev mailing list