[spambayes-bugs] [ spambayes-Bugs-1101281 ] imapfilter with mysql on mac has assertion error

SourceForge.net noreply at sourceforge.net
Thu Jan 13 01:54:03 CET 2005


Bugs item #1101281, was opened at 2005-01-13 11:54
Message generated for change (Comment added) made by anadelonbrin
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1101281&group_id=61702

Category: imapfilter
Group: 1.0.1
Status: Open
Resolution: None
Priority: 5
Submitted By: jscott (jscottjfs)
Assigned to: Tony Meyer (anadelonbrin)
Summary: imapfilter with mysql on mac has assertion error

Initial Comment:
Using persistent_use_database=False trains OK

Keeping everything else the same, and switching to
mysql leads to the following errors (or similar
assertion errors involving nspam instead) after a
couple minutes of training.

the mysql database has this:

mysql> describe bayes;
+-------+--------------+------+-----+---------+-------+
| Field   | Type          | Null  | Key | Default |
Extra  |
+-------+--------------+------+-----+---------+-------+
| word  | varchar(255) |      | PRI |         |       |
| nspam | int(11)        |      |      | 0       |       |
| nham  | int(11)        |      |      | 0       |       |
+-------+--------------+------+-----+---------+-------+

and 


mysql> select count(word) from bayes;
+-------------+
| count(word) |
+-------------+
|       20125 |
+-------------+


so everything is working well.  Then, somehow, the training
training runs amuck crashing imapfilter and giving this:



[dhcp-235-023:~/spambayes-1.0.1/scripts] jscott% python
sb_imapfilter.py -c -t -l -5
SpamBayes IMAP Filter Version 0.5 (November 2004)
and engine SpamBayes Engine Version 0.3 (January 2004).

Traceback (most recent call last):
  File "sb_imapfilter.py", line 924, in ?
    run()
  File "sb_imapfilter.py", line 914, in run
    imap_filter.Filter()
  File "sb_imapfilter.py", line 785, in Filter
    self.unsure_folder)
  File "sb_imapfilter.py", line 703, in Filter
    evidence=True)
  File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/spambayes/classifier.py",
line 190, in chi2_spamprob
    clues = self._getclues(wordstream)
  File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/spambayes/classifier.py",
line 493, in _getclues
    tup = self._worddistanceget(word)
  File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/spambayes/classifier.py",
line 508, in _worddistanceget
    prob = self.probability(record)
  File
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/site-packages/spambayes/classifier.py",
line 308, in probability
    assert hamcount <= nham
AssertionError


----------------------------------------------------------------------

>Comment By: Tony Meyer (anadelonbrin)
Date: 2005-01-13 13:54

Message:
Logged In: YES 
user_id=552329

(Opps.  I skimmed your message too fast - there aren't any
values for nspam/nham there, just the defaults).  Most of my
earlier comment is still correct, anyway.

Try having a look (select * from bayes where word="saved
state") at the nham/nspam values.  They should be at least
as large as any individual counts.  You can manually correct
them if you like, but it's generally a better idea to
retrain from scratch.

This might be a one-off problem, but if it does reoccur then
we can try and figure out what's causing the problem.  You
could also try using spambayes from CVS, which has a much
improved sb_imapfilter (which will be in the 1.1 release).

----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2005-01-13 12:13

Message:
Logged In: YES 
user_id=552329

The problem is that nham (nspam) is meant to be the total
number of ham (spam) messages that you have trained.  It
looks like it's 0 above, which is not good.

Offhand, I'm not sure what would cause this - updating the
nham/nspam values is done at the same time as the token
counts, so if one is wrong, they really ought to both be.

I'll try and find time to try and replicate this here later
today and update with what happens.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1101281&group_id=61702


More information about the Spambayes-bugs mailing list