[Python-Dev] GBayes design
Thu, 05 Sep 2002 19:19:57 +0200
Raymond Hettinger wrote:
> Is it too late to challenge a core design decision?
> Instead of multiplying probablities, use fuzzy logic methods.
> Classify the indicators into damning, strong, weak, neautral, ...
> After counting the number of indicators in each class, make
> a spam/ham decision that can be easily tweaked. This would
> make it easy to implement variations of Tim's recent clear
> win, where additional indicators are gathered until the
> balance shifts sharply to one side.
> Some other advantages are:
> -- easily interpreted score vectors (6 damning, 7 strong, 4 weak, ... )
> -- avoids mathematical issues with indicators not being independent
> -- allows the addition of non-token based indicators. for instance,
> a preponderance of caps would be a weak indicator. the presence
> of caps separated by spaces would be a strong indicator.
> -- the decision logic would be more intuitive
> -- avoids the issue of having equal amounts of spam and ham in
> the sample
> The core concept would stay the same -- it's really just a shift from
> continuous to discrete.
Hmm, there's nothing discrete about fuzzy logic (ok, this
claim is 0.65% true ;-)
The problem is more about multi-dimensional optimization where
you are interested in distilling several different inputs
into one value.
A weighted average is the simplest form to use here and there
are various multi-dimensional optimization algorithms around
to aid in finding the "optimal" weights.
Another approach would be using a shallow neural network.
The only "problem" with these is that Tim generates a
variable number of inputs, AFAICT, so that you'd have
to use some preprocessing to make the number of inputs
Would make a nice internship project, I guess :-)
CEO eGenix.com Software GmbH
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/