[Spambayes] full o' spaces
Tim Stone - Four Stones Expressions
tim at fourstonesExpressions.com
Sat Mar 8 09:25:21 EST 2003
3/8/2003 2:45:00 AM, Anthony Baxter <anthony at interlink.com.au> wrote:
>We can sit here for days, weeks and months and think of ways to defeat
>the existing classifier. We have done that, in the past. But a change that
>is not tested and shown to improve existing results, does _not_ belong
>in the code base. It goes against _everything_ that has made this project
Ok, so let me summarize what I think our discussion has boiled down to.
1. We will not make changes that regress our results on existing spam.
2. We will engage in ongoing analysis of spam, keeping our testing corpora up
to date as best we can. When significant (we have yet to define significant)
amounts of FN start happening, we will adjust the tokenizer appropriately.
Point 1 is a given. There seems to be considerable inertia in the project
toward using point 2 as an ongoing strategy. I can live with it, because
there's tremendous value in what we're doing, and it really does work. I just
have to say, though, that from a marketing viewpoint (believe it or not, I was
a marketer in a former life), this strategy can potentially shoot us in the
foot, because we aren't the ones finding problems, spammers are, and I think
this could cause our users to lose faith in our product. "I trained this
stuff as spam, and this thing STILL doesn't catch it." If that happens to a
user more than a few times, the conclusion will be that it doesn't work. I'm
telling you, it doesn't take but one bad article in a ZD publication, and it's
all over with for us.
Ok, I'm off my soapbox. <smile> This has been a great discussion.
c'est moi - TimS
More information about the Spambayes