[spambayes-dev] Mapping tokens to messages

Tue Jan 6 11:12:25 EST 2004

I just checked in two new scripts to the utilities directory.
mkreversemap.py builds a map file which maps features to mail files and
message-id's.  extractmessage.py takes a set of features and generates one
or two Unix mbox format files containing the messages which have those
features.

You generate such a map file with something like

    mkreversemap.py -d features.db -t spam Data/Spam
    mkreversemap.py -d features.db -t ham Data/Ham

newspam and newham can be any sort of mail sources acceptable to
spambayes.mboxutils.getmbox().  Each key maps a feature to a two-element
tuple.  Each tuple contains a dict which maps filenames to sets of
message-id's.

Features are typically specified explicitly using the -f flag, but a message
source containing messages with X-Spambayes-Evidence headers can also be
given as a feature source.  Use it like so:

    extractmessages.py -d features.db -f "list-post:" -H msgs.ham

to generate an mbox file called msgs.ham containing all messages referenced
in features.db which contain the feature "list-post:".  You can give both -H
and -S flags to generate both ham and spam mbox files.  You can give
multiple -f flags as well.

You can also give a suitable mail source instead of -f flags.  All features
which appear in any X-Spambayes-Evidence headers will be used:

    python extractmessages.py -d newmap.db -H msgids.ham mailbox

This isn't as useful I don't think because it's like drinking from a
firehose.  You generally have to deal with far too many messages.

Have fun with it.

Skip