[Spambayes] Token seen in more ham than ham trained

John Hunter jdhunter at ace.bsd.uchicago.edu
Fri Jul 1 17:59:08 CEST 2005


 
I am using spambayes with nnml/gnus and started getting this error
message when filtering incoming mail (complete traceback below)

  File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 314, in probability
    assert hamcount <= nham, "Token seen in more ham than ham trained."
AssertionError: Token seen in more ham than ham trained.

The only thing mildly nonstandard that I am doing with my database is
using a standalone script sb_classify_nnml.py below to report the
classifier results, much like classify does in the web interface.  I
have this script bound to a key in my gnus summary buffer so I can get
a report on the current article.  Is there a modification to the
script I can make that will prevent this problem from recurring, if
indeed the script is the cause of this problem?

Is there a way to fix my database, or otherwise avoid this error,
other than retraining?

>>> spambayes.__version__
'1.1a1+'


Thanks!
JDH


### sb_classify_nnml.py

#!/usr/bin/python
import sys, os

from spambayes.tokenizer import tokenize
from spambayes.classifier import Bayes
from spambayes.hammie import Hammie
from spambayes import storage

db = os.path.join(os.environ['HOME'], '.hammiedb')
classifier = storage.open_storage(db, 'dbm', 'r')
hammie = Hammie(classifier)

nnml, relpath = sys.argv[1].split(':')
relpath = relpath.replace('.', os.sep)
fullpath = os.path.join(os.environ['HOME'], 'Mail', relpath, sys.argv[2])
message = file(fullpath).read()
ptotal, clues = hammie.score(message, evidence=True)
print 'Probability spam', ptotal

swap = [ (p,word) for word, p in clues]
swap.sort()
swap.reverse()
for item in swap:
    print '    %1.1f : %s'%item


### Traceback

Traceback (most recent call last):
  File "python/examples/spambayes_hammie.py", line 13, in ?
    ptotal, clues = hammie.score(message, evidence=True)
  File "/usr/lib/python2.4/site-packages/spambayes/hammie.py", line 62, in score    return self._scoremsg(msg, evidence)
  File "/usr/lib/python2.4/site-packages/spambayes/hammie.py", line 38, in _scoremsg
    return self.bayes.spamprob(tokenize(msg), evidence)
  File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 196, in chi2_spamprob
    clues = self._getclues(wordstream)
  File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 499, in _getclues
    tup = self._worddistanceget(word)
  File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 514, in _worddistanceget
    prob = self.probability(record)
  File "/usr/lib/python2.4/site-packages/spambayes/classifier.py", line 314, in probability
    assert hamcount <= nham, "Token seen in more ham than ham trained."
AssertionError: Token seen in more ham than ham trained.


More information about the Spambayes mailing list