[spambayes-bugs] [ spambayes-Bugs-852137 ] sb_imapfilter.py
AssertionError: hamcount <= nham
SourceForge.net
noreply at sourceforge.net
Fri Dec 5 11:27:31 EST 2003
Bugs item #852137, was opened at 2003-12-01 15:39
Message generated for change (Comment added) made by tonylownds
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=852137&group_id=61702
Category: imapfilter
Group: Source code - CVS
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Lownds (tonylownds)
Assigned to: Tony Meyer (anadelonbrin)
Summary: sb_imapfilter.py AssertionError: hamcount <= nham
Initial Comment:
When I classify through sb_imapfilter.py, I am getting an
AssertionError. Any ideas? I am using spambayes from CVS;
courier IMAP; python 2.2.2; and a just-deleted database. See
below for commands, and further below for database dumps.
[tony ~]$ rm hammie.db spambayes.messageinfo.db
[tony ~]$ /usr/bin/sb_imapfilter.py -t
SpamBayes IMAP Filter Beta1, version 0.1 (September
2003),
using SpamBayes IMAP Filter Web Interface Alpha2, version
0.02
and engine SpamBayes Beta2, version 0.2 (July 2003).
Loading state from hammie.db database
hammie.db is a new database
Loading database hammie.db... Done.
Training
Training ham folder INBOX.Ham
************** 14 trained.
Training spam folder INBOX.Spam
******************************************** 44
trained.
Persisting hammie.db state in database
Training took 2.87554502487 seconds, 58 messages were
trained
[tony ~]$ /usr/bin/sb_imapfilter.py -c
SpamBayes IMAP Filter Beta1, version 0.1 (September
2003),
using SpamBayes IMAP Filter Web Interface Alpha2, version
0.02
and engine SpamBayes Beta2, version 0.2 (July 2003).
Loading state from hammie.db database
hammie.db is an existing database, with 44 spam and 10
ham
Loading database hammie.db... Done.
Classifying *.Traceback (most recent call last):
File "/usr/bin/sb_imapfilter.py", line 821, in ?
run()
File "/usr/bin/sb_imapfilter.py", line 811, in run
imap_filter.Filter()
File "/usr/bin/sb_imapfilter.py", line 676, in Filter
self.unsure_folder)
File "/usr/bin/sb_imapfilter.py", line 595, in Filter
evidence=True)
File "/usr/lib/python2.2/site-packages/spambayes/
classifier.py", line 158, in chi2_spamprob
clues = self._getclues(wordstream)
File "/usr/lib/python2.2/site-packages/spambayes/
classifier.py", line 395, in _getclues
prob = self.probability(record)
File "/usr/lib/python2.2/site-packages/spambayes/
classifier.py", line 242, in probability
assert hamcount <= nham
AssertionError
----------------------------------------------------------------------
>Comment By: Tony Lownds (tonylownds)
Date: 2003-12-05 16:27
Message:
Logged In: YES
user_id=24100
Web interface reports same numbers in front page as the
command line does. The stats page did not load, it gave a
traceback:
Traceback (most recent call last):
File "/usr/lib/python2.2/site-packages/spambayes/Dibbler.py",
line 457, in found_terminator
getattr(plugin, name)(**params)
File "/usr/lib/python2.2/site-packages/spambayes/
UserInterface.py", line 926, in onStats
s = Stats.Stats()
File "/usr/lib/python2.2/site-packages/spambayes/Stats.py", line
42, in __init__
self.CalculateStats()
File "/usr/lib/python2.2/site-packages/spambayes/Stats.py", line
58, in CalculateStats
for msg in msginfoDB.db:
File "//usr/lib/python2.2/shelve.py", line 70, in __getitem__
f = StringIO(self.dict[key])
TypeError: key type must be string
Training to a different folder seemed to work better -- there are
still some number discrepancies however. The commands below
show my config sans password and two runs through training; it
reports training 45 spam and 16 ham the first time but only having
45 ham and 16 ham the second time.
Note, I've added a few messages to my corpus since the original
report.
[tony ~]$ grep -v password bayescustomize.ini
[imap]
server:localhost
username:tony
ham_train_folders:INBOX.TrainedHam
spam_train_folders:INBOX.TrainedSpam
spam_folder:INBOX.NewSpam
unsure_folder:INBOX.Unsure
move_trained_spam_to_folder:INBOX.Spam
move_trained_ham_to_folder:INBOX.Ham
[globals]
verbose:True
[tony ~]$ rm hammie.db spambayes.messageinfo.db
[tony ~]$ sb_imapfilter.py -t
SpamBayes IMAP Filter Beta1, version 0.1 (September 2003),
using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02
and engine SpamBayes Beta2, version 0.2 (July 2003).
Loading state from hammie.db database
hammie.db is a new database
Loading database hammie.db... Done.
Training
Training ham folder INBOX.TrainedHam
**************** 16 trained.
Training spam folder INBOX.TrainedSpam
********************************************* 45
trained.
Persisting hammie.db state in database
Training took 7.78080093861 seconds, 61 messages were trained
[tony at habib ~]$ sb_imapfilter.py -t
SpamBayes IMAP Filter Beta1, version 0.1 (September 2003),
using SpamBayes IMAP Filter Web Interface Alpha2, version 0.02
and engine SpamBayes Beta2, version 0.2 (July 2003).
Loading state from hammie.db database
hammie.db is an existing database, with 45 spam and 11 ham
Loading database hammie.db... Done.
Training
Training ham folder INBOX.TrainedHam
0 trained.
Training spam folder INBOX.TrainedSpam
0 trained.
Training took 0.0115129947662 seconds, 0 messages were trained
I can set up an account on my server with webmail and IMAP
access if it will help debug. Send me email directly if interested.
Thanks
----------------------------------------------------------------------
Comment By: Tony Meyer (anadelonbrin)
Date: 2003-12-03 01:57
Message:
Logged In: YES
user_id=552329
Sorry, I missed that (was that in the mailing list post, too? I
must have missed it twice).
So there's something wrong then with it reporting that it
trained on 58 messages, but having the db only have 54. If
you use the web interface and look at the "stats" page, how
many messages does it report there? (That goes off the
messageinfo db rather than hammie.db).
If you use the move_trained_[sp|h]am_to_folder options, do
all the messages get moved?
I'm not sure whether the problem here is that it's not actually
training all those messages, or that something is going wrong
saving the increased count to the hammie.db.
----------------------------------------------------------------------
Comment By: Tony Lownds (tonylownds)
Date: 2003-12-02 19:24
Message:
Logged In: YES
user_id=24100
I think this should be re-opened. Please look at the command
below, included in the original report, carefully:
[tony ~]$ rm hammie.db spambayes.messageinfo.db
That command removes hammie.db as well.
----------------------------------------------------------------------
Comment By: Tony Meyer (anadelonbrin)
Date: 2003-12-01 23:34
Message:
Logged In: YES
user_id=552329
As per Tim Stone's message on spambayes at python.org:
Removing the messageinfo db and not the stats db is the
*cause*of this problem. imapfilter relies on the messageinfo
db to tell it which messages it should train on and which it
has already processed. By deleting that, but not your stats
(hammie) db, you're in for all sorts of trouble. You need to
delete both if you want to start afresh.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=852137&group_id=61702
More information about the Spambayes-bugs
mailing list