[Spambayes] Timestamp analysis
T. Alexander Popiel
popiel@wolfskeep.com
Mon Oct 28 17:50:04 2002
This set of runs took me a lot longer than expected; first
I had a couple errors in my scripts causing result files to
collide, then I wanted to do it again saving pickles for
probing, and finally I discovered that the day-of-week stuff
was failing (getting dow:invalid) for nearly all my mail.
I have not yet fixed the latter, so the day-of-week results
are invalid for the concept, but valid for the implementation.
Also, the implementation of generate_time_buckets seems to
use 10 minute time buckets, not 6 minute buckets as the code
comments suggest.
Overall, looking at the date in detail, unrelated to anything
else, seems neutral. Almost perfectly so; at most, there was
a one unsure difference, which is not significant.
In the table below,
r) mine_received_headers: False
basic_header_tokenize: False
R) mine_received_headers: True
basic_header_tokenize: True
t) generate_time_buckets: False
T) generate_time_buckets: True
d) extract_dow: False
D) extract_dow: True
-> <stat> tested 200 hams & 200 spams against 1800 hams & 1800 spams
[...]
filename: rtd rtD rTd rTD Rtd RtD RTd RTD
ham:spam: 2000:2000 2000:2000 2000:2000 2000:2000
2000:2000 2000:2000 2000:2000 2000:2000
fp total: 3 3 3 3 3 3 3 3
fp %: 0.15 0.15 0.15 0.15 0.15 0.15 0.15 0.15
fn total: 12 12 12 12 12 12 12 12
fn %: 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60
unsure t: 53 53 54 54 31 31 31 31
unsure %: 1.32 1.32 1.35 1.35 0.78 0.78 0.78 0.78
real cost: $52.60 $52.60 $52.80 $52.80 $48.20 $48.20 $48.20 $48.20
best cost: $48.20 $48.20 $48.20 $48.20 $38.80 $38.80 $38.80 $38.80
h mean: 0.40 0.40 0.40 0.40 0.30 0.30 0.30 0.30
h sdev: 5.39 5.39 5.38 5.38 4.47 4.47 4.48 4.48
s mean: 98.45 98.46 98.46 98.46 98.85 98.85 98.85 98.85
s sdev: 9.76 9.76 9.76 9.75 9.06 9.06 9.06 9.05
mean diff: 98.05 98.06 98.06 98.06 98.55 98.55 98.55 98.55
k: 6.47 6.47 6.48 6.48 7.28 7.28 7.28 7.28
I have not yet posted this on my website...
- Alex