[Spambayes] RE: Do you need to continue training ham?
rcoe at CambridgeMA.GOV
Fri Sep 19 13:07:23 EDT 2003
So to emphasize the "hamminess" of a group of messages (e.g., those from a given correspondent or sent to a given list), you could move them to your "Ambiguous" folder and then click "Recover from Spam". Right? Would that approach help those who have complained of a large number of false positives without forcing them into a complete retrain?
MIS Department, City of Cambridge
831 Massachusetts Ave, Cambridge MA 02139 · 617-349-4217 · fax 617-349-6165
> -----Original Message-----
> From: Tim Peters [mailto:tim.one at comcast.net]
> Sent: Thursday, September 18, 2003 7:38 PM
> To: Rob Rosenfeld; spambayes at python.org
> Subject: RE: [Spambayes] Do you need to continue training ham?
> [Rob Rosenfeld]
> > Hey folks. I have moved from SpamAssassin to the SpamBayes Outlook
> > plug-in. The integration is great. I'm a bit confused about one
> > part. I had stockpiles of ham and spam to initially train SpamBayes
> > with.
> Note that spambayes works best if you train on an
> approximately equal number
> of each. It doesn't take millions <wink>, either. For
> example, I started
> this project, and my home Outlook classifier still hasn't
> been trained on
> 2000 messages total (I get about 600 per day, and my
> classifier database is
> going on one year old, so I've trained on less than 1% of the
> email I've
> received in that time).
> > If I understand correctly, every time SpamBayes detects and
> > moves a spam, it trains on it, kind of giving it "ongoing" spam
> > training. Is that correct?
> > If it doesn't move it as spam, does it train on it as ham?
> Not that either. It auto-trains on messages for which you
> explicitly click
> the "Recover from Spam" or "Delete as Spam" buttons. In
> addition, it *may*
> train on messages *you* move to spam or ham folders,
> depending on which
> boxes you've checked in the spambayes Manager's Training tab, section
> "Incremental Training". These aren't necessarily ideal
> training protocols,
> but they're the best we've been able to implement so far that
> most users
> seem able to deal with. Ideal would be to train on a small
> random sample of
> all the email you get, and expire training messages over time
> too. That
> seems hard.
> Spambayes at python.org
> Check the FAQ before asking: http://spambayes.sf.net/faq.html
More information about the Spambayes