[Spambayes] hammiefilter just started failing
Gary Benson
gary at inauspicious.org
Wed Feb 12 10:50:45 EST 2003
Tim Stone - Four Stones Expressions wrote:
> The 'correct' solution to this problem is a retrain. But that's
> just not always gonna be practical. I've had problems with these
> assertions, but it seems that they should <raise
> eyebrow>always</raise eyebrow> be true...
Hmmm, I just realised that they are floating point numbers: it's quite
possible that the cause of the failure is compounded floating point
inaccuracies. Is one or both of the numbers in question derived by
repeatedly doing something? It doesn't necessarily have to be in one
pass of the program if the number comes from the database.
I've known things like this happen before, especially if you are doing
things like adding numbers of very different orders of magnitude
together and then expecting the result to make sense.
Gary
> The statistics dudes are gonna have to weigh in on this one, because
> these parameters are used in the combining scheme, and fooling with
> them has consequences. Tim? Gary? Rob? - TimS
> 2/11/2003 6:03:05 AM, Gary Benson <gary at inauspicious.org> wrote:
>
> >I've been using Spambayes for about a fortnight with great success,
> >but last night it started to fail. Messages are being delivered (exim
> >has a safety net against pipes failing) but they are also being
> >bounced with the following traceback:
> >
> >| Traceback (most recent call last):
> >| File "/usr/bin/hammiefilter", line 134, in ?
> >| main()
> >| File "/usr/bin/hammiefilter", line 131, in main
> >| action()
> >| File "/usr/bin/hammiefilter", line 87, in filter
> >| print h.filter(msg)
> >| File "/usr/lib/python2.2/site-packages/spambayes/hammie.py", line 98, in
> filter
> >| prob, clues = self._scoremsg(msg, True)
> >| File "/usr/lib/python2.2/site-packages/spambayes/hammie.py", line 38, in
> _scoremsg
> >| return self.bayes.spamprob(tokenize(msg), evidence)
> >| File "/usr/lib/python2.2/site-packages/spambayes/classifier.py", line
> 217, in chi2_spamprob
> >| clues = self._getclues(wordstream)
> >| File "/usr/lib/python2.2/site-packages/spambayes/classifier.py", line
> 441, in _getclues
> >| prob = self.probability(record)
> >| File "/usr/lib/python2.2/site-packages/spambayes/classifier.py", line
> 304, in probability
> >| assert spamcount <= nspam
> >| AssertionError
> >
> >This is happening with all messages: a quick check shows that
> >spamcount is slightly higher than nspam (like 104 and 103) so I just
> >replaced the assertion with 'if spamcount > nspam: spamcount = nspam'
> >as a temporary workaround.
> >
> >Has anyone heard of this happening before? I'd like to know if this
> >is a known problem before I start trying to debug it...
> >
> >I have a copy of my .hammiedb (taken before I did the above tweak) if
> >you want it.
> >
> >Cheers,
> >Gary
> >
> >[ gary at inauspicious.org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ]
> >
> >_______________________________________________
> >Spambayes mailing list
> >Spambayes at python.org
> >http://mail.python.org/mailman/listinfo/spambayes
> >
> >
>
>
> c'est moi - TimS
> http://www.fourstonesExpressions.com
> http://wecanstopspam.org
>
>
>
[ gary at inauspicious.org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ]
More information about the Spambayes
mailing list