[Spambayes] Bug in classifier.py
Tony Meyer
tameyer at ihug.co.nz
Thu Oct 16 18:05:39 EDT 2003
> I have just downloaded version 1.0a6.1, and after some
> tinkering got it to work properly with procmail. However,
> every time I get a "ham" message, the sb_filter script
> generates a traceback:
[...]
> File "/u/bjorn/src/spambayes/spambayes/classifier.py", line
> 243, in probability
> assert hamcount <= nham
This means that the database is no longer good, i.e. that you have seen a
ham token - "content-type:text/plain", in more ham messages than you have
trained on.
> Some debug printouts later, it seems that the token
> 'content-type:text/plain' is the bad guy - one of the 'ham'
> messages I used for training was a bounce message, which
> included the original email headers in the message body. The
> comments in classifier.py seem to indicate that this isn't a
> new issue either.
Do you mean that if you train on this message you get the
"content-type:text/plain" token incremented twice, and not once? A token
should only ever be incremented once per message, no matter how many times
it appears. If this is the case, then it's a serious bug.
In any case, the solution is to either retrain from scratch, or to flatten
your database to a text file (sb_dbexpimp.py), fix the totals at the top,
and then unflatten it again.
=Tony Meyer
More information about the Spambayes
mailing list