[Mailman-Developers] mm2.1 - DEFAULT_PLAIN_DIGEST_KEEP_HEADERS
Barry A. Warsaw
barry at python.org
Thu Jan 16 19:32:38 EST 2003
DC> I haven't thought of a real good way to set up the training,
DC> though. You want to be able to select particular messages
DC> from the archives and feed them into the classifier intact
DC> (headers and everything.) That sounds like writing a new
DC> archive index from the ground up (which might be a good thing
DC> anyway, but I hope I'm wrong and that isn't necessary!)
If pre-training really is necessary then we have a couple of options:
- We can include a small collection of known spam to do the initial
spam training. We'd train the list on this stuff when the feature
was enabled. That should be easy to do, although we may have to
update the canned spam now and then.
- If there have been any messages to the list, and if any of these
messages have been held for approval (e.g. moderator flag starts out
turned on), then we have a corpus of known good messages. We don't
have to scan the mbox file, we can just train on approved, held ham.
- If no messages have been posted yet, Tim Peters suggested training
on the list's welcome message. That, plus the spam and unsure holds
that may result from initial spam seeding, may be enough to warm up
the classifier.
-Barry
More information about the Mailman-Developers
mailing list