[Spambayes] how spambayes handles image-only spams

Bill Yerazunis wsy at merl.com
Sat Sep 6 11:29:07 EDT 2003


   From: "Ryan Malayter" <rmalayter at bai.org>

   > Statistically speaking, HTML mail is 
   > either from a spammer or from a clueless 
   > git, and in either case can usually be 
   > delayed without penalty or discarded outright.

   As indicated above, I do not think this analysis is true anymore. And
   characterizing someone as a clueless git because they don't change their
   mail client's default message format or "love" plain text... Well, let
   us know when you get back to the real world. 

Um.... you're arguing politics of desire against actual measured
statistics.  

In my current CRM114 corpus (which is running realtime and delivering
better accuracy than I myself can deliver- well over 99.9%):

SingleToken  Spam	   Nonspam

<p>	     49		   0
<br>	     207	   32
<td>	     48		   0
<font	     57		   2
<a	     117	   2

Other HTML tokens have similar statistics.  The margin of error on
each of these (aliasing probability) is 1 - 1/2^64, in other words, a
few billionths of a percent of a chance that this is due to aliasing
in the database.

E pur si moivre, dude.  E pur si moivre.

  -Bill Yerazunis




More information about the Spambayes mailing list