RE: [Spambayes] sb_imapfilter.py AssertionError: hamcount <= nham
When I classify through sb_imapfilter.py, I am getting an AssertionError. Any ideas? I am using spambayes from CVS; courier IMAP; python 2.2.2; and a fresh database. See below for commands. [...] assert hamcount <= nham AssertionError
How fresh? This error says that you have a token in your database that has appeared in more ham than you have trained it on - which isn't possible. IOW, the database is corrupt. You can try to manually fix the db via db_expimp.py, but it's easier (especially if you call it fresh already) to just retrain from scratch. If this happens regularly, it would be great to know the sequence of events that can reproduce it (in a sf bug tracker <http://sf.net/projects/spambayes>), as we still don't really know what causes this error. =Tony Meyer
At 12:19 PM +1300 12/1/03, Tony Meyer wrote:
When I classify through sb_imapfilter.py, I am getting an AssertionError. Any ideas? I am using spambayes from CVS; courier IMAP; python 2.2.2; and a fresh database. See below for commands. [...] assert hamcount <= nham AssertionError
How fresh?
Very... I remove the database files right before training.
[tony ~]$ rm hammie.db spambayes.messageinfo.db [tony ~]$ /usr/bin/sb_imapfilter.py -t
This error says that you have a token in your database that has appeared in more ham than you have trained it on - which isn't possible.
Ah... while training it said 14 ham trained, while classifying it only said 10 ham.
Training ham folder INBOX.Ham ************** 14 trained. ... hammie.db is an existing database, with 44 spam and 10 ham
I didn't notice that before.
If this happens regularly, it would be great to know the sequence of events that can reproduce it (in a sf bug tracker <http://sf.net/projects/spambayes>), as we still don't really know what causes this error.
Sure, bug #852137, although without access to my IMAP server I don't see how it will be reproducable. Has anyone used Courier IMAP? Maybe the way it returns message identifiers is causing problems. -Tony
Hi Tony,
This error says that you have a token in your database that has appeared in more ham than you have trained it on - which isn't possible.
Ah... while training it said 14 ham trained, while classifying it only said 10 ham.
Do you force the training (i.e. use the -f switch) ? sb_mboxtrain -f If you don't force the training old mail will still be marked as trained but they won't be included in the database (This is a feature so that we don't train on the same email many times) If you force the training even old mail marked as trained will be used from the training. When you train from scratch you should alway force the training. Remi
At 10:57 AM -0500 12/1/03, papaDoc wrote:
Hi Tony,
This error says that you have a token in your database that has appeared in more ham than you have trained it on - which isn't possible.
Ah... while training it said 14 ham trained, while classifying it only said 10 ham.
Do you force the training (i.e. use the -f switch) ? sb_mboxtrain -f
I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does not have an -f switch. I removed spambayes.messageinfo.db before running the training database. That should do the same thing as -f, right? Here are the commands I use to reproduce this: [tony ~]$ rm hammie.db spambayes.messageinfo.db [tony ~]$ /usr/bin/sb_imapfilter.py -t [tony ~]$ /usr/bin/sb_imapfilter.py -c I found something else that is interesting. When I train again, it trains 4 more messages, which should have been trained already. Here I train twice in a row and it trains 8 more messages each time: [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained [tony ~]$ /usr/bin/sb_imapfilter.py -t SpamBayes IMAP Filter Beta1, version 0.1 (September 2003), using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02 and engine SpamBayes Beta2, version 0.2 (July 2003). Loading state from hammie.db database hammie.db is an existing database, with 45 spam and 12 ham Loading database hammie.db... Done. Training Training ham folder INBOX.Ham .......***...*.. 4 trained. Training spam folder INBOX.Spam ..............***...................*........ 4 trained. Persisting hammie.db state in database Training took 0.65074801445 seconds, 8 messages were trained I'll try to figure out why sp_imapfilter.py is retraining those messages. -Tony
On Mon, 1 Dec 2003 10:35:28 -0800, Tony Lownds <tony-bayes@lownds.com> wrote:
At 10:57 AM -0500 12/1/03, papaDoc wrote:
Hi Tony,
This error says that you have a token in your database that has appeared in more ham than you have trained it on - which isn't possible.
Ah... while training it said 14 ham trained, while classifying it only said 10 ham.
Do you force the training (i.e. use the -f switch) ? sb_mboxtrain -f
I'm using sb_imapfilter.py, no sb_mboxtrain. sb_imapfilter.py does not have an -f switch. I removed spambayes.messageinfo.db before running the training database. That should do the same thing as -f, right?
Not necessarily. If you remove the messageinfo db, then sb has forgotten what messages it has already trained on, and will quite possibly train a message that should otherwise be ignored... If you do this, then you must start with a completely new training database as well...
--
Vous exprimer; Exprésese; Te stesso esprimere; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com See my writing at www.xanga.com/obj3kshun
participants (4)
-
papaDoc -
Tim Stone -
Tony Lownds -
Tony Meyer