[Spambayes] how spambayes handles image-only spams
Bill Yerazunis
wsy at merl.com
Sun Sep 7 21:56:59 EDT 2003
From: "Tim Peters" <tim.one at comcast.net>
You two get very different kinds of email mixes, and that's all there is to
this. And someone who uses Emacs to read email is by definition too geeky
to be representative of anyone in the real world <wink>.
No, I use Emacs to read the mail so that I can see what the spammers
are up to. For normal use (i.e. when I'm flirting with someone), I
use Yahoo Mail. :)
> E pur si moivre, dude. E pur si moivre.
There more kinds of email users in heaven and earth than are dreamt of in
your classifier, Bill.
That's one thing I _like_ about this list. At least y'all are moderately
literate. :-)
I don't get an email mix anything like my sisters get either,
except for the spam. I know they got a lot more HTML ham than I
get, and sounds like Ryan gets even more than they do. So it goes
-- people are different.
So noted. :-)
Well, on the grounds that the SpamAssassin corpus is a little less
biased, I re-ran the tests against the .css files that the SA test
corpus generates (using the TOE learning strategy). Accuracy on this
corpus is just over 98% for crm114, and barely 70% for me-the-human.
The results for SA test corpus:
Token Spam Nonspam
<p> 143 144
<br> 380 289
<td> 67 119
<font 305 281
<a 218 346
So, it seems that "font" is somewhat spammy, and so is "br",
but <a and <td aren't, and <p> is totally equivocal.
Does this help? :)
-Bill Yerazunis
More information about the Spambayes
mailing list