[spambayes-dev] Generating a sample training database

Tim Peters tim.one at comcast.net
Wed Sep 17 22:31:16 EDT 2003


[Skip Montanaro]
> In theory, however I don't think it's trivial for people using the
> Outlook plugin to train on a single mail message.  They'd have to
> move several messages from valid hammy mailboxes to a new one, train
> on it, then move the messages back to their original locations.

It's not that hard, although I don't know how many users understand all the
things they can do with Outlook.  For example, I keep distinct ham and spam
training folders (and those are *all* I train from).  When I want to train
on, e.g., a selection of ham, I Ctrl-Left-Click the ones I want, and drag
the multi-selection to the ham training folder while holding the right mouse
button down.  When the button is released, a little menu pops up asking
whether I want to Move (the messages), Copy (the messages), or Cancel
(forget the whole thing).  I select Copy, and that's the end of it.  It
takes much longer to read this sentence than to perform the whole operation.

There's an even simpler way to copy:  drag the selection to the desired
folder while holding the Ctrl key down.  I can never remember that, though
(when using extreme shortcuts, I always end up copying when I want to move,
and vice versa), so stick to the method that asks me what I want when it's
nearly over (btw, same thing (depress right button while dragging) works in
Windows Explorer for copying, moving, or linking files between folders).

> We'll just have to try it and see.
>
> I'll take the lead in grabbing the ham and spam and putting together a
> sample training database

Cool!  Thank you.

> (pickle format seems easiest).  If you'd like to contribute (no more
> than two ham and two spam per person please), forward such messages to
> me and make sure the Subject: includes "Sample Ham" or "Sample Spam".
> I will filter such messages out using procmail before SpamBayes can
> see them.

Offhand, I suggest disabling all header-line clue generation except for
Subject line.




More information about the spambayes-dev mailing list