[Spambayes] New chi improvement

Gary Robinson grobinson at transpose.com
Wed Apr 28 22:24:31 EDT 2004


Hi,

After a looong time I have a test setup to test an idea I've had in mind for
a year or so. I tested it and it looks like it actually helps. (At least the
test result was statistically significant by my count.)

 Abstract: 

> One of the many techniques which has recently been employed for filtering spam
> is one describedin the Linux Journal article A Statistical Approach to the
> Spam Problem. This technique incorporates ideas from the seminal article A
> Plan For Spam as well as R.A.Fisher's technique for combining p-values by
> means of the chi-square distribution. The technique presented in here takes
> the chi-square-based approach a step further by taking into account two facts:
> a) there is redundancy in the token probabilities, and b) spam andham emails
> have different amounts of such redundancy. Fivefold cross-validation was
> carried out on the new technique and is described here testing whether these
> factors actually lead to better performance. The results were positive and
> statistically significant.

http://www.garyrobinson.net/2004/04/improved_chi.html

If you have an interest in the chi technique, I hope you'll take a look and
I look forward to any comments!
 

--Gary

-- 
Putting http://wecanstopspam.org in your email helps it pass through
overzealous spam filters.

Gary Robinson
CEO
Transpose, LLC
grobinson at transpose.com
207-942-3463
Company: http://www.transpose.com
Blog:    http://www.garyrobinson.net





More information about the Spambayes mailing list