[Spambayes] Re: For the bold

Rob Hooft rob@hooft.net
Sat, 05 Oct 2002 17:26:34 +0200


This is a multi-part message in MIME format.
---------------------- multipart/mixed attachment
Another large message.

Appended is a pdf containing six histograms made using 
max_discriminators=55

The first one is zham for all ham messages. As you can see, the 
distribution is asymmetric. Furthermore, a simple average and standard 
deviation calculation results in a bell curve that does not follow the 
important tail of the histogram: the chances will be severely 
underestimated by these parameters.

The second one is abs(zham) for all ham messages. The bell curve fits 
this histogram much better!

The third page is zspam for all spam messages.

The fourth page is abs(zspam) for all spam messages. Also much better.

Fifth and sixth are zspam for all ham and zham for all spam, just to 
complete the picture.

 From the second and fourth image, I drew the conclusion that my 
Z-scores are overestimated by a factor of 6.7/6.6. This means e.g. that 
the zspam for all ham distribution is not -53 +/- 20, but -8 +/- 3 and 
the zham for all spam distribution is not -43 +/- 18, but -6.4 +/- 2.6

I will try a discriminator based on this.

Rob

-- 
Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/

---------------------- multipart/mixed attachment
A non-text attachment was scrubbed...
Name: all.pdf
Type: application/pdf
Size: 56510 bytes
Desc: not available
Url : http://mail.python.org/pipermail-21/spambayes/attachments/20021005/df86955c/all.pdf

---------------------- multipart/mixed attachment--