[Spambayes] Cunning use of quoted-printable
Richie Hindle
richie@entrian.com
Thu, 03 Oct 2002 17:44:00 +0100
[Tim]
> you've trained on exactly one message (this one) producing (among
> others) "word"
>
> 'from:email addr:biglobe.ne.jp>'
>
> The estimated *from counting* probability that a message containing this
> word is spam is then exactly 0.0 (you've seen it once, and only in ham).
>
> Then Gary's Bayesian probability adjustment is applied [...]
<Grovels apologetically> I did briefly think that this might be due to
this message having unique words, but I thought that the non-zero scores
for those words meant they must have appeared in a mix of ham and spam. I
confess I've let the mathematical discussions slip past me, so I wasn't
expecting words unique to the ham corpus to have non-zero probabilities.
I should have looked more carefully at the words ('from:email
name:<rxmx7x5x1' is a big giveaway) and paid more attention to the maths.
I will pay more attention in class. I will pay more attention in class.
I will pay more attention in class. I will pay more attention in class.
I will pay more attention in class. I will pay more attention in class.
I will pay more attention in class. I will pay more attention in class.
I will pay more attention in class. I will pay more attention in class.
I will pay more attention in class. I will pay more attention in class.
I will pay more attention in class. I will pay more attention in class.
I will pay more attention in class. I will pay more attention in class.
I will pay more attention in class. I will pay more attention in class.
I will pay more attention in class. I will pay more attention in class.
Many thanks for the explanation, and sorry to have wasted your time.
--
Richie Hindle
richie@entrian.com