# [Python-Dev] GBayes design

**M.-A. Lemburg
**
mal@egenix.com

*Thu, 05 Sep 2002 19:19:57 +0200*

Raymond Hettinger wrote:
>* Is it too late to challenge a core design decision?
*>*
*>* Instead of multiplying probablities, use fuzzy logic methods.
*>* Classify the indicators into damning, strong, weak, neautral, ...
*>*
*>* After counting the number of indicators in each class, make
*>* a spam/ham decision that can be easily tweaked. This would
*>* make it easy to implement variations of Tim's recent clear
*>* win, where additional indicators are gathered until the
*>* balance shifts sharply to one side.
*>*
*>* Some other advantages are:
*>* -- easily interpreted score vectors (6 damning, 7 strong, 4 weak, ... )
*>* -- avoids mathematical issues with indicators not being independent
*>* -- allows the addition of non-token based indicators. for instance,
*>* a preponderance of caps would be a weak indicator. the presence
*>* of caps separated by spaces would be a strong indicator.
*>* -- the decision logic would be more intuitive
*>* -- avoids the issue of having equal amounts of spam and ham in
*>* the sample
*>*
*>* The core concept would stay the same -- it's really just a shift from
*>* continuous to discrete.
*
Hmm, there's nothing discrete about fuzzy logic (ok, this
claim is 0.65% true ;-)
The problem is more about multi-dimensional optimization where
you are interested in distilling several different inputs
into one value.
A weighted average is the simplest form to use here and there
are various multi-dimensional optimization algorithms around
to aid in finding the "optimal" weights.
Another approach would be using a shallow neural network.
The only "problem" with these is that Tim generates a
variable number of inputs, AFAICT, so that you'd have
to use some preprocessing to make the number of inputs
constant.
Would make a nice internship project, I guess :-)
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/