[Spambayes] RE: Central Limit Theorem??!! :)

Tim Peters tim.one@comcast.net
Sat, 28 Sep 2002 01:57:25 -0400


[Anthony Baxter]
> Isn't there actually _two_ z scores? "how much does it look like ham"
> and "how much does it look like spam"?

Yes.

> (I might be mis-understanding the code here).

Nope, you got it.

> In this case, isn't the scoring going to need to handle
> "hey, both are greater than X sdevs,

Yes.  z-scores are highly non-linear, though, and converting both to
probabilities may (or may not) simplify reasoning about how best to combine
them.

> set it to 50%"
>
> It seems like you'd actually end up with two cutoff numbers, or one
> as a difference from 50%...

The only three outcomes I've seen can only be called "there's no chance this
is spam but some real chance it's ham", "there's no chance this is spam but
some real chance it's spam", and "oops!  best I can tell, there's no earthly
chance it's either".  The "there's some real chance it's ham and some real
chance it's spam" outcome has never happened clearly, except perhaps in very
short msgs (where it's common enough to see both z-scores at about 3 to 4).