[Spambayes] More ham than spam?
Missy
missyhmakr at hotmail.com
Tue Aug 31 15:17:41 CEST 2004
Can you tell me how to do this? I read the article mentioned, but am not
sure how to do this.
Missy
-----Original Message-----
From: spambayes-bounces at python.org [mailto:spambayes-bounces at python.org] On
Behalf Of Ferino Mardo
Sent: Tuesday, August 31, 2004 6:05 AM
To: Kenny Pitt; Ferino Mardo; spambayes at python.org
Subject: RE: [Spambayes] More ham than spam?
Replies below:
> -----Original Message-----
> From: Kenny Pitt [mailto:kennypitt at hotmail.com]
> Sent: Monday, August 30, 2004 07:40 PM
> To: Ferino Mardo; spambayes at python.org
> Subject: RE: [Spambayes] More ham than spam?
>
>
> Ferino Mardo wrote:
> > The SPAMbayes manager complains that I have much more ham
> than spam.
> > What should one do? Delete his good emails to make things even?
>
> We hear this question a lot, but most people find that they have too
> much
> *spam* and not enough ham. Ham messages typically have a more
> consistent set of senders, receivers, and topics, and therefore
> usually require less training to identify correctly than spam
> messages.
>
> Did you have SpamBayes train itself on some of your existing messages
> when you first configured? If so, you probably had a lot more ham
> messages in your initial training set.
>
Yes I did. I have lots of emails I consider good and only a few SPAM.
Just curious if the message mean anything other than what is the obvious.
> If you are getting acceptable accuracy from SpamBayes then don't worry
> too much about the warning. It's only a guideline, and how much
> affect the imbalance has will depend on how severe the imbalance is as
> well as on your specific mixture of e-mails.
>
I'm getting more than acceptable accuracy from SPAMbayes. I like the
product!
> On the other hand, if your accuracy is poor then I would recommend
> deleting your training data and retraining SpamBayes from scratch with
> no initial training data.
> Instead, just train manually on any Unsure messages as well as
> messages that SpamBayes identifies incorrectly (ham classified as spam
> or vice versa). We usually refer to this training strategy as "Train
> on Errors and Unsures", and you can read more about it on the
> SpamBayes wiki:
>
http://entrian.com/sbwiki/TrainOnErrorsAndUnsures
You can also get more information about alternative training strategies
here:
http://entrian.com/sbwiki/TrainingIdeas
--
Kenny Pitt
_______________________________________________
Spambayes at python.org
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html
More information about the Spambayes
mailing list