[Spambayes] Mining the headers

Skip Montanaro skip@pobox.com
Sun Oct 27 00:18:25 2002


    Alex> Tim mentioned three tokenizer options (mine_received_headers,
    Alex> count_all_header_lines, basic_header_tokenize).  I hadn't played
    Alex> with these yet, so I ran the 8 combinations of these.

I've had three other options knocking around locally which haven't seemed to
help or hurt when applied to my collections: mine_date_headers,
generate_time_buckets, and extract_dow. The first controls overall attention
to the Date: header.  The second generates tokens like time:12:3 (the third
six-minute bucket of the twelfth hour).  The third generates tokens like
dow:0 (Monday).  Should I check them in to see if they are useful for other
people?  (I seem to have a bit different fp & fn results than others.)

Skip