[Spambayes] Mozilla and summary of bayes calcs

Gary Robinson grobinson@transpose.com
Wed, 18 Sep 2002 21:01:27 -0400


That's really interesting. I had started down that general path of analysis
but stopped before I got to the point of seeing how it made sense. Might be
worthwhile to go farther with your (2) but understand time constraints and
other opportunities for improvement.

I'm going to point to the your
http://mail.python.org/pipermail/python-dev/2002-August/028216.html post
from my essay.

BTW, have you looked at the papers on naive bayesian classification? I'm
curious as to why you are starting with the Graham approach rather than the
nbc approach for spambayes. I'm very interested in the question of which
performs better in real life testing.

--Gary


-- 
Gary Robinson
CEO
Transpose, LLC
grobinson@transpose.com
207-942-3463
http://www.emergentmusic.com
http://radio.weblogs.com/0101454


> From: Tim Peters <tim.one@comcast.net>
> Date: Wed, 18 Sep 2002 20:23:38 -0400
> To: Gary Robinson <grobinson@transpose.com>
> Cc: SpamBayes <spambayes@python.org>
> Subject: RE: [Spambayes] Mozilla and summary of bayes calcs
> 
> BTW, I'm not a statistician, but I think there's "an obvious" sense in which
> Graham's formulation is Bayesian that's been overlooked -- although it's
> pretty well hidden <wink>.  I ranted about that earlier:
> 
> http://mail.python.org/pipermail/python-dev/2002-August/028216.html
> 
> Two footnotes to that:  (1) In email, Paul later said that the proportion of
> ham to spam in his inbox is in fact close to 1 to 1, not the 2 to 1 I had
> speculated there.  (2) I very briefly experimented with taking P(spam) and
> P(not-spam) into account (based on the training set proportion) as explained
> there, but doing so was neither a clear win nor a clear loss, and, at the
> time, I was getting big clear wins via other means so just dropped this line
> of experimentation.
>