[Spambayes] full o' spaces
lists at morpheus.demon.co.uk
Sat Mar 8 21:18:55 EST 2003
Tim Stone - Four Stones Expressions <tim at fourstonesExpressions.com> writes:
> 3/8/2003 2:45:00 AM, Anthony Baxter <anthony at interlink.com.au> wrote:
>>We can sit here for days, weeks and months and think of ways to defeat
>>the existing classifier. We have done that, in the past. But a change that
>>is not tested and shown to improve existing results, does _not_ belong
>>in the code base. It goes against _everything_ that has made this project
> Ok, so let me summarize what I think our discussion has boiled down to.
> 1. We will not make changes that regress our results on existing spam.
> 2. We will engage in ongoing analysis of spam, keeping our testing corpora up
> to date as best we can. When significant (we have yet to define significant)
> amounts of FN start happening, we will adjust the tokenizer appropriately.
> Point 1 is a given. There seems to be considerable inertia in the project
> toward using point 2 as an ongoing strategy. I can live with it, because
> there's tremendous value in what we're doing, and it really does work. I just
> have to say, though, that from a marketing viewpoint (believe it or not, I was
> a marketer in a former life), this strategy can potentially shoot us in the
> foot, because we aren't the ones finding problems, spammers are, and I think
> this could cause our users to lose faith in our product. "I trained this
> stuff as spam, and this thing STILL doesn't catch it." If that happens to a
> user more than a few times, the conclusion will be that it doesn't work. I'm
> telling you, it doesn't take but one bad article in a ZD publication, and it's
> all over with for us.
> Ok, I'm off my soapbox. <smile> This has been a great discussion.
Can I borrow that box for a moment? Thanks... :-)
The key point, for me, is that spambayes is the only anti-spam tool I
have ever used that made a real dent in my spam problem. And the
dent it made was pretty much total. While I still get unsures, and
even the occasional FN, in reality I don't have a spam problem any
I don't know why spambayes is so good, but the single most distinctive
aspect of the project is the rigorous analysis of results, and
ruthless refusal to include techniques which don't pull their weight.
When I mention spambayes to friends, my "marketing" approach is,
1. It works. Really well.
2. It learns what you consider spam, and acts on that.
3. It's been tested on thousands of spam, with error rates so low as
to be negligible.
4. You do need to maintain it - a little ongoing training helps (but
it's not a major task, and if you don't bother, you're still going
to get very impressive results)
5. Er. But it's a bit rough around the edges still. I'll help you
install it, if you like.
Notice (5). That's what is killing us right now with real people (me,
I'm a figment of your imagination: be very afraid <wink>). Anything
else is minor.
Your point (2) means that we can claim that we know it works - we've
tested it (my point (3)). Pre-emptive attempts to address possible new
spam tricks loses that - you can't *prove* the effectiveness of a new
technique if you don't have corpora with evidence of that technique to
test against. I view the benefit of being able to show proof that the
program works as greater than the risk of being branded reactive.
Oh, and by the way - you use Microsoft's security strategy to
demonstrate that a reactive approach is bad. But that's FUD. Another
business that is (as far as the general public is aware) totally
reactive is the anti-virus business. If you liken the spambayes
approach to an anti-virus strategy, it suddenly looks much better :-)
OK, who wants the box next?
This signature intentionally left blank
More information about the Spambayes