[Spambayes] RE: Spam vs time-of-day

Skip Montanaro skip@pobox.com
Tue Oct 29 04:19:54 2002


    Tim> Your buckets span 10 minutes.  The comment in the code is confused
    Tim> about this too.  That's why your graph and mine both have 144
    Tim> points on the X axis (24 * 6 = 144; you have six *buckets* per
    Tim> hour, and each spans 10 minutes).

Yeah, after seeing this several times I'm beginning to think I made a
mistake. ;-)

    >> The large spike at 0 is an artifact of my simpleminded Date header
    >> scanning.  Invalid dates probably wound up with a value of 0.

    Tim> And at that time, *every* Date header generated a dow:invalid token
    Tim> (as well as the correct token, when possible).  That's been
    Tim> repaired since then.

Not really.  The graph was generated by a shell pipeline using suitable
non-spambayes tools (awk, sed, gnuplot, etc).  My dow:invalid mistake came
later.

    >> Buckets were calculated using local time.  That way I didn't penalize
    >> Anthony Baxter and other folks who happen not to live in the US.

    Tim> I'm unsure what "were calculated using local time" means.  

Simply that I ignored timezone information.  If the Date: header was

    Date: Mon, 28 Oct 2002 14:29:30 -0500

the send time was taken to be 14:29, local time.  The -0500 was ignored.

    Tim> Does the checked in code do that or not?  

Yes, the checked in code just uses a regular expression which matches
HH:MM:SS preceded and followed by a space.  Nothing else in the Date: header
is considered for this particular token.

Skip