[Spambayes] Timestamp analysis

T. Alexander Popiel popiel@wolfskeep.com
Mon Oct 28 17:50:04 2002


This set of runs took me a lot longer than expected; first
I had a couple errors in my scripts causing result files to
collide, then I wanted to do it again saving pickles for
probing, and finally I discovered that the day-of-week stuff
was failing (getting dow:invalid) for nearly all my mail.
I have not yet fixed the latter, so the day-of-week results
are invalid for the concept, but valid for the implementation.

Also, the implementation of generate_time_buckets seems to
use 10 minute time buckets, not 6 minute buckets as the code
comments suggest.

Overall, looking at the date in detail, unrelated to anything
else, seems neutral.  Almost perfectly so; at most, there was
a one unsure difference, which is not significant.

In the table below, 
    r) mine_received_headers: False
       basic_header_tokenize: False
    R) mine_received_headers: True
       basic_header_tokenize: True

    t) generate_time_buckets: False
    T) generate_time_buckets: True

    d) extract_dow: False
    D) extract_dow: True


-> <stat> tested 200 hams & 200 spams against 1800 hams & 1800 spams
[...]
filename:      rtd     rtD     rTd     rTD     Rtd     RtD     RTd     RTD
ham:spam:  2000:2000       2000:2000       2000:2000       2000:2000      
                   2000:2000       2000:2000       2000:2000       2000:2000
fp total:        3       3       3       3       3       3       3       3
fp %:         0.15    0.15    0.15    0.15    0.15    0.15    0.15    0.15
fn total:       12      12      12      12      12      12      12      12
fn %:         0.60    0.60    0.60    0.60    0.60    0.60    0.60    0.60
unsure t:       53      53      54      54      31      31      31      31
unsure %:     1.32    1.32    1.35    1.35    0.78    0.78    0.78    0.78
real cost:  $52.60  $52.60  $52.80  $52.80  $48.20  $48.20  $48.20  $48.20
best cost:  $48.20  $48.20  $48.20  $48.20  $38.80  $38.80  $38.80  $38.80
h mean:       0.40    0.40    0.40    0.40    0.30    0.30    0.30    0.30
h sdev:       5.39    5.39    5.38    5.38    4.47    4.47    4.48    4.48
s mean:      98.45   98.46   98.46   98.46   98.85   98.85   98.85   98.85
s sdev:       9.76    9.76    9.76    9.75    9.06    9.06    9.06    9.05
mean diff:   98.05   98.06   98.06   98.06   98.55   98.55   98.55   98.55
k:            6.47    6.47    6.48    6.48    7.28    7.28    7.28    7.28

I have not yet posted this on my website...

- Alex