[Spambayes] Gmail

skip at pobox.com skip at pobox.com
Thu Aug 25 19:22:37 CEST 2011


    Jim> I get much less spam now than I did then.  Gmail's spam filter does
    Jim> a better job, because you have hundreds of millions of users
    Jim> training it instead of just you.

I'd be interested to see any references you might have about the technology
behind Gmail's spam filter.  I found this:

  http://mail.google.com/mail/help/intl/en/fightspam/spamexplained.html

That suggests user input is a key component to their spam technology, but
they say nothing about how they use it.  There is also clearly a personal
component in there somewhere.  It can't just be the wisdom of the crowd.  I
suspect input from Gmail users is just one component of the overall
decision-making process.  In SpamBayes parlance, user input might just be
another synthetic token to be consumed by a probabilistic classifier.

To support that contention I offer one anecdote and one hypothetical
scenario.

Anecdote: My wife and I both use Gmail.  From time-to-time I send her stuff
I think she might find interesting.  The other day I asked if she'd seen one
of these messages.  She hadn't.

  Skip: Have you checked your spam?
  Ellen: I never look at my spam.

I looked at her spam and found five messages from me over the previous week
or two.  My guess is that she checked off a bunch of messages in her inbox,
including one from me, and accidentally poked the Spam button instead of the
adjacent Delete button.  From her perspective, they appear to work about the
same.  Messages disappear from her inbox.  Still, she was telling Gmail that
my message was spam, and it responded accordingly.  On the flip side, while
I can believe that some people on the net don't like mail I send, I doubt
there are enough people saying my mails are spam to make a random email I
send to some other Gmail user classified as spam without further input from
that user.

Hypothetical scenario: 99.99% of the male Gmail-reading population probably
aren't interested in "be better in bed" mails and classify any which slip
through as spam, but there is that 0.01% who do find such mails very
interesting.  If nobody thought they were interesting, the spammers wouldn't
make any money off them and stop sending them).  If, on the other hand, that
0.01% fraction say, "hey, that ain't spam, that's good stuff!", then Gmail
probably responds very accurately to their preferences.  It has to respond
on a personalized basis, and I think this has to be the largest
consideration, more than the 99.99% of the male Gmail population saying they
are perfectly happy with their performance in bed, thank you very much.
Otherwise, such individual preferences could never override the wisdom of
the crowd.

Skip


More information about the SpamBayes mailing list