[spambayes-dev] RE: [Spambayes] Question (or possibly a bug report)
Tim Peters
tim.one at comcast.net
Thu Jul 24 01:34:37 EDT 2003
[Mark Hammond]
> OK, the code now looks like:
>
> print repr(S), repr(H)
> S = ln(S) + Sexp * LN2
> H = ln(H) + Hexp * LN2
>
> And I tested on a hammy mail. I got:
>
> 3,0955714375167259e-015 0.0
> ...
> File "E:\src\spambayes\spambayes\classifier.py", line 238, in
> chi2_spamprob
> H = ln(H) + Hexp * LN2
> exceptions.OverflowError: math range error
So H == 0.0 is the culprit. Unexpected!
> A spam yields:
> 0.0 0.0
> File "E:\src\spambayes\spambayes\classifier.py", line 237, in
> chi2_spamprob
> S = ln(S) + Sexp * LN2
> exceptions.OverflowError: math range error
So S == 0.0 irritated math.log first. Equally unexpected <wink>.
> Interestingly, S in the first one uses a comma, while all the zeroes
> got '.'
>
> Clueless ly,
Well, the last one is easy: *Python* adds the dot to 0. Python's repr()
for floats *generally* acts like C's %.17g, except for
repr(a_float_that_happens_to_be_an_exect_integer)
plus a couple others you don't want to hear about <wink>. Then C does
>>> "%.17g" % 0.0
'0'
>>>
and that violates Guido's desire that the *type* of an object be apparent
from its repr. So Python's format_float (in floatobject.c) first lets C
have a crack at it, and if C's sprintf didn't stick in a radix point, Python
appends its own, plus a trailing zero:
*cp++ = '.';
*cp++ = '0';
*cp++ = '\0';
Back to spambayes, H and S can't become zero <wink>. The only way they
could is if a computed probability is 0.0 or 1.0, and that's never supposed
to happen. Printing 'prob' in the loop would tell us whether that's so,
but, if it is so, the true cause could be in a ton of other code.
More information about the spambayes-dev
mailing list