[Spambayes] Spam Bayes use in a corporation

Brent Bertram bbertram at mts.net
Thu May 20 00:15:39 EDT 2004


Thank you everyone for the great news and information.  Everyone was worried
it would spread through the corporation and we would not be licensed for all
the copies.  I told them not to be concerned but they had FREEWARE stuck on
their minds and told me to do my research.  So I went straight to the source
for the info, you guys.  Its all good now, thanks again.


----- Original Message -----
From: "Tim Peters" <tim.one at comcast.net>
To: "'Michael C. Neel'" <neel at mediapulse.com>; "'Brent Bertram'"
<bbertram at mts.net>; <spambayes at python.org>
Sent: Wednesday, May 19, 2004 9:53 PM
Subject: RE: [Spambayes] Spam Bayes use in a corporation


> [Michael C. Neel]
> > As said SpamBayes is under an OpenSource license.  There is no money
> > involed, but there is one simple rule it all boils down to:  You can
> > take it, use it, sell it, give it away, and even change it - but when
> > you change it you have to make those changes available to anyone who
> > wants them.  Posting the to the list or sending them in as a patch
> > would be enough.
>
> The PSF license SpamBayes is released under does not require that.  People
> are free to build proprietary ("closed source") software incorporating any
> or all of the SpamBayes code, and keep their changes secret, if that's
what
> they want (they won't really be doing themselves or their users a favor by
> keeping their code secret, but it takes experience rather than arguments
to
> understand why that's so).  The PSF license does require that derivative
> works include "a brief summary of the changes made" to the SpamBayes code
> they incorporate, but "brief summary" means what says.  For example,
> "replaced classifier.py's probability calculations with a secret
algorithm"
> is good enough (if that's what they did).
>
> Requiring derivative works to be released under a particular kind of
license
> is a feature of *some* open source licenses, most notably the GPL.  But
it's
> not part of the definition of Open Source as promulgated by the Open
Source
> Initiative, and the PSF and GPL licenses are both certified as Open Source
> by the OSI:
>
>     http://www.opensource.org/
>
> > Read up on the downsides of whitelisting and blacklisting so you are
> > ready to show why a bayes filter is better.  Then the reason SpamBayes
> > works better than the others is most Bayes filters use two "buckets" -
> > good and bad; spambayes uses 3 - good bad and unknown.  This helps
> > because Dr Bayes's filter never expected the data to try and fool him,
> > i.e. spam acting like ham.
>
> Or vice versa.  Some messages are plain ambiguous, and require human
> judgment to classify correctly.  I'm still delighted at how well SpamBayes
> usually manages to isolate those.
>
> > And last thing I'll add is my own personal results.  I have two account
> > filtered with spambayes, one gets about 20 hams a day and 400 spams,
> > the other gets maybe 5 hams a week and 30 spams a day.  With spambayes
> > trained on a set of ~100 spam and ~100 ham for each account, I see
> > maybe 5 suspects a day for both accounts combined.  After a few weeks
> > there are no more mislabeled spams as hams and vice versa (these are
> > rare to start with).  No other spam tool I know of is this good.
>
> I see you took the advice to keep training data balanced to heart.  Good
for
> you!  It really does work best that way, and we still don't have a good
> approach to living with badly unbalanced training data.  Then again,
nobody
> pays me to think about that either <wink>.
>
>
>




More information about the Spambayes mailing list