[Mailman-Developers] mm2.1 - DEFAULT_PLAIN_DIGEST_KEEP_HEADERS

Thu Jan 16 19:32:38 EST 2003

    DC> I haven't thought of a real good way to set up the training,
    DC> though.  You want to be able to select particular messages
    DC> from the archives and feed them into the classifier intact
    DC> (headers and everything.)  That sounds like writing a new
    DC> archive index from the ground up (which might be a good thing
    DC> anyway, but I hope I'm wrong and that isn't necessary!)

If pre-training really is necessary then we have a couple of options:

- We can include a small collection of known spam to do the initial
  spam training.  We'd train the list on this stuff when the feature
  was enabled.  That should be easy to do, although we may have to
  update the canned spam now and then.

- If there have been any messages to the list, and if any of these
  messages have been held for approval (e.g. moderator flag starts out
  turned on), then we have a corpus of known good messages.  We don't
  have to scan the mbox file, we can just train on approved, held ham.

- If no messages have been posted yet, Tim Peters suggested training
  on the list's welcome message.  That, plus the spam and unsure holds
  that may result from initial spam seeding, may be enough to warm up
  the classifier.

-Barry