[Spambayes] Cunning use of quoted-printable

Richie Hindle richie@entrian.com
Thu, 03 Oct 2002 17:44:00 +0100


[Tim]
> you've trained on exactly one message (this one) producing (among
> others) "word"
> 
>     'from:email addr:biglobe.ne.jp>'
> 
> The estimated *from counting* probability that a message containing this
> word is spam is then exactly 0.0 (you've seen it once, and only in ham).
> 
> Then Gary's Bayesian probability adjustment is applied [...]

<Grovels apologetically>  I did briefly think that this might be due to
this message having unique words, but I thought that the non-zero scores
for those words meant they must have appeared in a mix of ham and spam.  I
confess I've let the mathematical discussions slip past me, so I wasn't
expecting words unique to the ham corpus to have non-zero probabilities.
I should have looked more carefully at the words ('from:email
name:<rxmx7x5x1' is a big giveaway) and paid more attention to the maths.

I will pay more attention in class.  I will pay more attention in class.
I will pay more attention in class.  I will pay more attention in class.
I will pay more attention in class.  I will pay more attention in class.
I will pay more attention in class.  I will pay more attention in class.
I will pay more attention in class.  I will pay more attention in class.
I will pay more attention in class.  I will pay more attention in class.
I will pay more attention in class.  I will pay more attention in class.
I will pay more attention in class.  I will pay more attention in class.
I will pay more attention in class.  I will pay more attention in class.
I will pay more attention in class.  I will pay more attention in class.

Many thanks for the explanation, and sorry to have wasted your time.

-- 
Richie Hindle
richie@entrian.com