[Spambayes] Finally getting around to regression testing

Josiah Carlson jcarlson@uci.edu
Sat, 28 Sep 2002 03:35:42 -0700


Hey,

It's kind of funny.  When I was using pasp for just my own stuff, I
didn't really do any testing because I didn't really feel like it.  Now
that I expect that more people are going to be using it and checking it
out...I decided to try doing actual testing on the little guy.

After receiving what was obviously a spam email (the subject I will not
divulge), and having it pass through two different filters (PaulGraham at
.02,GaryRobinson p/q/s at .49), I was curious as to what was going on
and why it could pass through so well.

It turned out that those keywords that were making the most difference
in terms of spamminess also didn't show up all that often.  After
spending a little time, I decided to try a simple weighted average.  It
turned out to work pretty well.

Then I thought that maybe the bias in doubling the effectiveness of
ham words was a bit overdone, so I tried dropping to no bias.

So you want some numbers to see what's going on?

Spam: 747
Ham: 5145

testtext -> Paul Graham combination
testtext2 -> Gary Robinson p/q/s combination
testtext3 -> weighted average
(What are the used names for each of these algorithms for future posts?)

As you can see, my output is quite different from what is normally
posted...that's because I'm using a completely different piece of
software, though the numbers should be comparable.

For my testing corpus, I only have 16 spams (the latest 16 spams
received) and 62 hams (all random email in my inbox that I've been too
lazy to categorize...includes two domain expiration notices that are
legit, they show up later)

with the bias of 2*good:
results for testtext:
ctf     fn      fp      fnp             fpp
0.5     2       0       0.125000        0.000000
0.6     2       0       0.125000        0.000000
0.7     2       0       0.125000        0.000000
0.8     2       0       0.125000        0.000000
0.9     2       0       0.125000        0.000000
0.95    2       0       0.125000        0.000000

results for testtext2:
ctf     fn      fp      fnp             fpp
0.5     2       0       0.125000        0.000000
0.6     4       0       0.250000        0.000000
0.7     6       0       0.375000        0.000000
0.8     7       0       0.437500        0.000000
0.9     8       0       0.500000        0.000000
0.95    8       0       0.500000        0.000000

results for testtext3:
ctf     fn      fp      fnp             fpp
0.5     3       0       0.187500        0.000000
0.6     4       0       0.250000        0.000000
0.7     5       0       0.312500        0.000000
0.8     5       0       0.312500        0.000000
0.9     6       0       0.375000        0.000000
0.95    6       0       0.375000        0.000000


without the bias:
results for testtext:
ctf     fn      fp      fnp             fpp
0.5     0       2       0.000000        0.032258
0.6     0       2       0.000000        0.032258
0.7     0       2       0.000000        0.032258
0.8     0       2       0.000000        0.032258
0.9     0       2       0.000000        0.032258
0.95    0       2       0.000000        0.032258

results for testtext2:
ctf     fn      fp      fnp             fpp
0.5     0       2       0.000000        0.032258
0.6     3       0       0.187500        0.000000
0.7     3       0       0.187500        0.000000
0.8     4       0       0.250000        0.000000
0.9     5       0       0.312500        0.000000
0.95    5       0       0.312500        0.000000

results for testtext3:
ctf     fn      fp      fnp             fpp
0.5     0       6       0.000000        0.096774
0.6     0       6       0.000000        0.096774
0.7     0       5       0.000000        0.080645
0.8     1       1       0.062500        0.016129
0.9     3       0       0.187500        0.000000
0.95    4       0       0.250000        0.000000


With the bias, there are no false-positives in my current inbox, and
testtext performs the best with any cutoff.  The two false-negatives are
the aforementioned passed through spam and a dvd for $.49 spam.  I
apparently haven't been receiving that many spams of either type lately.

In any case, upon removal of the bias, I notice that all three
algorithms produce fewer false negatives, but have an increase in false
positives.  It may make sense to check different biases between 1 and 2. 
I would, but it's 3:30AM here, and I really should get to bed.

Goodnight,
 - Josiah