[Python-Dev] GBayes design

Raymond Hettinger python@rcn.com
Thu, 5 Sep 2002 11:43:20 -0400

Is it too late to challenge a core design decision?

Instead of multiplying probablities, use fuzzy logic methods.
Classify the indicators into damning, strong, weak, neautral, ...

After counting the number of indicators in each class, make
a spam/ham decision that can be easily tweaked.  This would
make it easy to implement variations of Tim's recent clear
win, where additional indicators are gathered until the
balance shifts sharply to one side.

Some other advantages are:
-- easily interpreted score vectors (6 damning, 7 strong, 4 weak, ... )
-- avoids mathematical issues with indicators not being independent
-- allows the addition of non-token based indicators.  for instance,
    a preponderance of caps would be a weak indicator.  the presence
    of caps separated by spaces would be a strong indicator.
-- the decision logic would be more intuitive
-- avoids the issue of having equal amounts of spam and ham in
    the sample

The core concept would stay the same -- it's really just a shift from
continuous to discrete.

of-course-this-is-entirely-outside-my-fields-of-knowledge-ly yours,

Raymond Hettinger