[Spambayes] Mining the headers
Skip Montanaro
skip@pobox.com
Sun Oct 27 00:18:25 2002
Alex> Tim mentioned three tokenizer options (mine_received_headers,
Alex> count_all_header_lines, basic_header_tokenize). I hadn't played
Alex> with these yet, so I ran the 8 combinations of these.
I've had three other options knocking around locally which haven't seemed to
help or hurt when applied to my collections: mine_date_headers,
generate_time_buckets, and extract_dow. The first controls overall attention
to the Date: header. The second generates tokens like time:12:3 (the third
six-minute bucket of the twelfth hour). The third generates tokens like
dow:0 (Monday). Should I check them in to see if they are useful for other
people? (I seem to have a bit different fp & fn results than others.)
Skip