[Spambayes] how spambayes handles image-only spams
Bill Yerazunis
wsy at merl.com
Sat Sep 6 11:29:07 EDT 2003
From: "Ryan Malayter" <rmalayter at bai.org>
> Statistically speaking, HTML mail is
> either from a spammer or from a clueless
> git, and in either case can usually be
> delayed without penalty or discarded outright.
As indicated above, I do not think this analysis is true anymore. And
characterizing someone as a clueless git because they don't change their
mail client's default message format or "love" plain text... Well, let
us know when you get back to the real world.
Um.... you're arguing politics of desire against actual measured
statistics.
In my current CRM114 corpus (which is running realtime and delivering
better accuracy than I myself can deliver- well over 99.9%):
SingleToken Spam Nonspam
<p> 49 0
<br> 207 32
<td> 48 0
<font 57 2
<a 117 2
Other HTML tokens have similar statistics. The margin of error on
each of these (aliasing probability) is 1 - 1/2^64, in other words, a
few billionths of a percent of a chance that this is due to aliasing
in the database.
E pur si moivre, dude. E pur si moivre.
-Bill Yerazunis
More information about the Spambayes
mailing list