[spambayes-bugs] [ spambayes-Bugs-852137 ] sb_imapfilter.py AssertionError: hamcount <= nham

SourceForge.net noreply at sourceforge.net
Fri Dec 5 11:27:31 EST 2003


Bugs item #852137, was opened at 2003-12-01 15:39
Message generated for change (Comment added) made by tonylownds
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=852137&group_id=61702

Category: imapfilter
Group: Source code - CVS
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Lownds (tonylownds)
Assigned to: Tony Meyer (anadelonbrin)
Summary: sb_imapfilter.py AssertionError: hamcount <= nham

Initial Comment:
When I classify through sb_imapfilter.py, I am getting an 
AssertionError. Any ideas? I am using spambayes from CVS; 
courier IMAP; python 2.2.2; and a just-deleted database. See 
below for commands, and further below for database dumps.

[tony ~]$ rm hammie.db spambayes.messageinfo.db
[tony ~]$ /usr/bin/sb_imapfilter.py -t
SpamBayes IMAP Filter Beta1, version 0.1 (September 
2003),
using SpamBayes IMAP Filter Web Interface Alpha2, version 
0.02
and engine SpamBayes Beta2, version 0.2 (July 2003).

Loading state from hammie.db database
hammie.db is a new database
Loading database hammie.db... Done.
Training
   Training ham folder INBOX.Ham
**************       14 trained.
   Training spam folder INBOX.Spam
********************************************       44 
trained.
Persisting hammie.db state in database
Training took 2.87554502487 seconds, 58 messages were 
trained
[tony ~]$ /usr/bin/sb_imapfilter.py -c
SpamBayes IMAP Filter Beta1, version 0.1 (September 
2003),
using SpamBayes IMAP Filter Web Interface Alpha2, version 
0.02
and engine SpamBayes Beta2, version 0.2 (July 2003).

Loading state from hammie.db database
hammie.db is an existing database, with 44 spam and 10 
ham
Loading database hammie.db... Done.
Classifying *.Traceback (most recent call last):
  File "/usr/bin/sb_imapfilter.py", line 821, in ?
    run()
  File "/usr/bin/sb_imapfilter.py", line 811, in run
    imap_filter.Filter()
  File "/usr/bin/sb_imapfilter.py", line 676, in Filter
    self.unsure_folder)
  File "/usr/bin/sb_imapfilter.py", line 595, in Filter
    evidence=True)
  File "/usr/lib/python2.2/site-packages/spambayes/
classifier.py", line 158, in chi2_spamprob
    clues = self._getclues(wordstream)
  File "/usr/lib/python2.2/site-packages/spambayes/
classifier.py", line 395, in _getclues
    prob = self.probability(record)
  File "/usr/lib/python2.2/site-packages/spambayes/
classifier.py", line 242, in probability
    assert hamcount <= nham
AssertionError



----------------------------------------------------------------------

>Comment By: Tony Lownds (tonylownds)
Date: 2003-12-05 16:27

Message:
Logged In: YES 
user_id=24100

Web interface reports same numbers in front page as the 
command line does. The stats page did not load, it gave a 
traceback:

Traceback (most recent call last):

  File "/usr/lib/python2.2/site-packages/spambayes/Dibbler.py", 
line 457, in found_terminator
    getattr(plugin, name)(**params)

  File "/usr/lib/python2.2/site-packages/spambayes/
UserInterface.py", line 926, in onStats
    s = Stats.Stats()

  File "/usr/lib/python2.2/site-packages/spambayes/Stats.py", line 
42, in __init__
    self.CalculateStats()

  File "/usr/lib/python2.2/site-packages/spambayes/Stats.py", line 
58, in CalculateStats
    for msg in msginfoDB.db:

  File "//usr/lib/python2.2/shelve.py", line 70, in __getitem__
    f = StringIO(self.dict[key])

TypeError: key type must be string


Training to a different folder seemed to work better -- there are 
still some number discrepancies however. The commands below 
show my config sans password and two runs through training; it 
reports training 45 spam and 16 ham the first time but only having 
45 ham and 16 ham the second time.

Note, I've added a few messages to my corpus since the original 
report.

[tony ~]$ grep -v password bayescustomize.ini
[imap]
server:localhost
username:tony
ham_train_folders:INBOX.TrainedHam
spam_train_folders:INBOX.TrainedSpam
spam_folder:INBOX.NewSpam
unsure_folder:INBOX.Unsure
move_trained_spam_to_folder:INBOX.Spam
move_trained_ham_to_folder:INBOX.Ham
[globals]
verbose:True
[tony ~]$ rm hammie.db spambayes.messageinfo.db
[tony ~]$ sb_imapfilter.py -t
SpamBayes IMAP Filter Beta1, version 0.1 (September 2003),
using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02
and engine SpamBayes Beta2, version 0.2 (July 2003).

Loading state from hammie.db database
hammie.db is a new database
Loading database hammie.db... Done.
Training
   Training ham folder INBOX.TrainedHam
****************       16 trained.
   Training spam folder INBOX.TrainedSpam
*********************************************       45 
trained.
Persisting hammie.db state in database
Training took 7.78080093861 seconds, 61 messages were trained
[tony at habib ~]$ sb_imapfilter.py -t
SpamBayes IMAP Filter Beta1, version 0.1 (September 2003),
using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02
and engine SpamBayes Beta2, version 0.2 (July 2003).

Loading state from hammie.db database
hammie.db is an existing database, with 45 spam and 11 ham
Loading database hammie.db... Done.
Training
   Training ham folder INBOX.TrainedHam
       0 trained.
   Training spam folder INBOX.TrainedSpam
       0 trained.
Training took 0.0115129947662 seconds, 0 messages were trained


I can set up an account on my server with webmail and IMAP 
access if it will help debug. Send me email directly if interested. 
Thanks



----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2003-12-03 01:57

Message:
Logged In: YES 
user_id=552329

Sorry, I missed that (was that in the mailing list post, too?  I 
must have missed it twice).

So there's something wrong then with it reporting that it 
trained on 58 messages, but having the db only have 54.  If 
you use the web interface and look at the "stats" page, how 
many messages does it report there?  (That goes off the 
messageinfo db rather than hammie.db).

If you use the move_trained_[sp|h]am_to_folder options, do 
all the messages get moved?

I'm not sure whether the problem here is that it's not actually 
training all those messages, or that something is going wrong 
saving the increased count to the hammie.db.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2003-12-02 19:24

Message:
Logged In: YES 
user_id=24100

I think this should be re-opened. Please look at the command 
below, included in the original report, carefully:

[tony ~]$ rm hammie.db spambayes.messageinfo.db

That command removes hammie.db as well.


----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2003-12-01 23:34

Message:
Logged In: YES 
user_id=552329

As per Tim Stone's message on spambayes at python.org:

Removing the messageinfo db and not the stats db is the 
*cause*of this problem.  imapfilter relies on the messageinfo 
db to tell it which messages it should train on and which it 
has already processed.  By deleting that, but not your stats 
(hammie) db, you're in for all sorts of trouble.  You need to 
delete both if you want to start afresh.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=852137&group_id=61702



More information about the Spambayes-bugs mailing list