[Spambayes] how spambayes handles image-only spams

Tim Peters tim.one at comcast.net
Sun Sep 7 21:26:53 EDT 2003

[Bill Yerazunis]
>>> Statistically speaking, HTML mail is either from a spammer or from a
>>> clueless git, and in either case can usually be delayed without
>>> penalty or discarded outright.

[Ryan Malayter]
>> As indicated above, I do not think this analysis is true anymore.
>> And characterizing someone as a clueless git because they don't
>> change their mail client's default message format or "love" plain
>> text... Well, let us know when you get back to the real world.

> Um.... you're arguing politics of desire against actual measured
> statistics.
> In my current CRM114 corpus (which is running realtime and delivering
> better accuracy than I myself can deliver- well over 99.9%):
> SingleToken  Spam	   Nonspam
> <p>            49              0
> <br>          207             32
> <td>           48              0
> <font          57              2
> <a            117              2
> Other HTML tokens have similar statistics.  The margin of error on
> each of these (aliasing probability) is 1 - 1/2^64, in other words, a
> few billionths of a percent of a chance that this is due to aliasing
> in the database.

You two get very different kinds of email mixes, and that's all there is to
this.  And someone who uses Emacs to read email is by definition too geeky
to be representative of anyone in the real world <wink>.

> E pur si moivre, dude.  E pur si moivre.

There more kinds of email users in heaven and earth than are dreamt of in
your classifier, Bill.  I don't get an email mix anything like my sisters
get either, except for the spam.  I know they got a lot more HTML ham than I
get, and sounds like Ryan gets even more than they do.  So it goes -- people
are different.

