[spambayes-dev] Mapping tokens to messages
skip at pobox.com
Tue Jan 6 11:12:25 EST 2004
I just checked in two new scripts to the utilities directory.
mkreversemap.py builds a map file which maps features to mail files and
message-id's. extractmessage.py takes a set of features and generates one
or two Unix mbox format files containing the messages which have those
You generate such a map file with something like
mkreversemap.py -d features.db -t spam Data/Spam
mkreversemap.py -d features.db -t ham Data/Ham
newspam and newham can be any sort of mail sources acceptable to
spambayes.mboxutils.getmbox(). Each key maps a feature to a two-element
tuple. Each tuple contains a dict which maps filenames to sets of
Features are typically specified explicitly using the -f flag, but a message
source containing messages with X-Spambayes-Evidence headers can also be
given as a feature source. Use it like so:
extractmessages.py -d features.db -f "list-post:" -H msgs.ham
to generate an mbox file called msgs.ham containing all messages referenced
in features.db which contain the feature "list-post:". You can give both -H
and -S flags to generate both ham and spam mbox files. You can give
multiple -f flags as well.
You can also give a suitable mail source instead of -f flags. All features
which appear in any X-Spambayes-Evidence headers will be used:
python extractmessages.py -d newmap.db -H msgids.ham mailbox
This isn't as useful I don't think because it's like drinking from a
firehose. You generally have to deal with far too many messages.
Have fun with it.
More information about the spambayes-dev