[Spambayes] RE: Spam vs time-of-day
Skip Montanaro
skip@pobox.com
Tue Oct 29 04:19:54 2002
Tim> Your buckets span 10 minutes. The comment in the code is confused
Tim> about this too. That's why your graph and mine both have 144
Tim> points on the X axis (24 * 6 = 144; you have six *buckets* per
Tim> hour, and each spans 10 minutes).
Yeah, after seeing this several times I'm beginning to think I made a
mistake. ;-)
>> The large spike at 0 is an artifact of my simpleminded Date header
>> scanning. Invalid dates probably wound up with a value of 0.
Tim> And at that time, *every* Date header generated a dow:invalid token
Tim> (as well as the correct token, when possible). That's been
Tim> repaired since then.
Not really. The graph was generated by a shell pipeline using suitable
non-spambayes tools (awk, sed, gnuplot, etc). My dow:invalid mistake came
later.
>> Buckets were calculated using local time. That way I didn't penalize
>> Anthony Baxter and other folks who happen not to live in the US.
Tim> I'm unsure what "were calculated using local time" means.
Simply that I ignored timezone information. If the Date: header was
Date: Mon, 28 Oct 2002 14:29:30 -0500
the send time was taken to be 14:29, local time. The -0500 was ignored.
Tim> Does the checked in code do that or not?
Yes, the checked in code just uses a regular expression which matches
HH:MM:SS preceded and followed by a space. Nothing else in the Date: header
is considered for this particular token.
Skip