[Spambayes] What is spam?

Neale Pickett neale@woozle.org
16 Sep 2002 14:44:26 -0700


My datasets aren't as pure as I thought :( While sorting through my FNs
and FPs, I've found some trends:

1.  When people forward spam to me, it gets tagged as spam.  I have a
    lot of forwarded spam in my inbox; I've asked people to send me
    stuff is the past so I can get a feel for what sort of spam is being
    sent to my users.  I use this feel to blacklist domains in the MAIL
    FROM: SMTP command.  It works pretty well when I stay on top of it.

2.  Non-spam I'd erroneously entered into my spam corpus gets most of
    the false negatives.  Neato!

3.  Stupid forwards (mostly urban legends, exultations to
    pray/boycott/vote a certain way, jokes, or inspirational stories)
    are not tagged as spam.  I get a lot of these too, from my
    grandmother and certain friends who seem to do nothing but relay
    chain letters.  But I don't get enough to train the filter against
    them, apparently.

With only one or two exceptions, that is the extent of my false
positives and false negatives.

I have to wonder, though, if the forwards (#3) are really false
negatives.  Should I have those in the ham folder, and be using another
method to weed out garbage of that type?  I'm not sure the current
classifier is up to sorting out urban legends :)

In any case, it's becoming clear to me that in the future when we're all
trying to help our grandmothers install spambayes, there will have to be
some way of reviewing FPs and FNs the way we're all doing it now.  In my
case at least, a lot of FPs and FNs aren't really F.