[Spambayes] sb_imapfilter.py AssertionError: hamcount <= nham

Tony Lownds tony-bayes at lownds.com
Mon Dec 1 13:35:28 EST 2003



At 10:57 AM -0500 12/1/03, papaDoc wrote:
>Hi Tony,
>
>>>  This error says that you have a token in your database that has
>>>appeared in more ham than you have trained it on - which isn't possible.
>>
>>
>>Ah... while training it said 14 ham trained, while classifying it 
>>only said 10 ham.
>
>Do you force the training (i.e. use the -f switch) ?
>sb_mboxtrain -f

I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does 
not have an -f switch. I removed spambayes.messageinfo.db before 
running the training database. That should do the same thing as -f, 
right?

Here are the commands I use to reproduce this:

[tony ~]$ rm hammie.db spambayes.messageinfo.db
[tony ~]$ /usr/bin/sb_imapfilter.py -t
[tony ~]$ /usr/bin/sb_imapfilter.py -c

I found something else that is interesting. When I train again, it 
trains 4 more messages, which should have been trained already. Here 
I train twice in a row and it trains 8 more messages each time:

[tony ~]$ /usr/bin/sb_imapfilter.py -t
SpamBayes IMAP Filter Beta1, version 0.1 (September 2003),
using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02
and engine SpamBayes Beta2, version 0.2 (July 2003).

Loading state from hammie.db database
hammie.db is an existing database, with 45 spam and 12 ham
Loading database hammie.db... Done.
Training
    Training ham folder INBOX.Ham
.......***...*..       4 trained.
    Training spam folder INBOX.Spam
..............***...................*........       4 trained.
Persisting hammie.db state in database
Training took 0.65074801445 seconds, 8 messages were trained

[tony ~]$ /usr/bin/sb_imapfilter.py -t
SpamBayes IMAP Filter Beta1, version 0.1 (September 2003),
using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02
and engine SpamBayes Beta2, version 0.2 (July 2003).

Loading state from hammie.db database
hammie.db is an existing database, with 45 spam and 12 ham
Loading database hammie.db... Done.
Training
    Training ham folder INBOX.Ham
.......***...*..       4 trained.
    Training spam folder INBOX.Spam
..............***...................*........       4 trained.
Persisting hammie.db state in database
Training took 0.65074801445 seconds, 8 messages were trained


I'll try to figure out why sp_imapfilter.py is retraining those messages.

-Tony




More information about the Spambayes mailing list