[Spambayes] how spambayes handles image-only spams

Bill Yerazunis
Sun Sep 7 21:56:59 EDT 2003

   From: "Tim Peters" <tim.one at comcast.net>

   You two get very different kinds of email mixes, and that's all there is to
   this.  And someone who uses Emacs to read email is by definition too geeky
   to be representative of anyone in the real world <wink>.

No, I use Emacs to read the mail so that I can see what the spammers
are up to.  For normal use (i.e. when I'm flirting with someone), I
use Yahoo Mail.  :)

   > E pur si moivre, dude.  E pur si moivre.

   There more kinds of email users in heaven and earth than are dreamt of in
   your classifier, Bill.

That's one thing I _like_ about this list.  At least y'all are moderately
literate.  :-)

   I don't get an email mix anything like my sisters get either,
   except for the spam.  I know they got a lot more HTML ham than I
   get, and sounds like Ryan gets even more than they do.  So it goes
   -- people are different.

So noted.  :-)

Well, on the grounds that the SpamAssassin corpus is a little less
biased, I re-ran the tests against the .css files that the SA test
corpus generates (using the TOE learning strategy).  Accuracy on this
corpus is just over 98% for crm114, and barely 70% for me-the-human.

The results for SA test corpus:

Token	    Spam   Nonspam
<p>	    143	   144
<br>	    380	   289
<td>	    67	   119
<font	    305	   281
<a	    218	   346

So, it seems that "font" is somewhat spammy, and so is "br",
but <a and <td aren't, and <p> is totally equivocal.

Does this help?  :)

     -Bill Yerazunis

