[spambayes-bugs] [ spambayes-Feature Requests-783846 ] Generalize!
SourceForge.net
noreply at sourceforge.net
Wed Aug 20 15:36:08 EDT 2003
Feature Requests item #783846, was opened at 2003-08-06 00:31
Message generated for change (Comment added) made by simnick66
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=783846&group_id=61702
Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Nicolas Kruchten (simnick66)
Assigned to: Nobody/Anonymous (nobody)
Summary: Generalize!
Initial Comment:
I think that the spambayes engine (which works great
btw) could be generalized to classify way more than just
spam, it could completely replace rule-based filtering.
As i understand it (correct me if i'm wrong!) there is
nothing spam-specific about this software, it could be
used to classify any message into any folder, while
retaining its spam-filtering abilities.
the way this would work would be to specify a default
folder (the inbox) and then a set of other folders (ie
spam, folder1, folder2, folder3) and when a message
comes in, it could be shuttled to the appropriate folder if
the threshold for that folder's bayesian rating is crossed.
if multiple thresholds are crossed, it could then go into
an unsure folder, or the values could be compared to
see where it should go. each folder could have
independently set thresholds...
is there an easy way to hack the current outlook plugin
to do this for testing purposes? (ie to install multiple
versions and spoof them into comparing different folders
or... ?)
----------------------------------------------------------------------
>Comment By: Nicolas Kruchten (simnick66)
Date: 2003-08-20 21:36
Message:
Logged In: YES
user_id=838551
ok, i'll try it, thanks!
----------------------------------------------------------------------
Comment By: Tony Meyer (anadelonbrin)
Date: 2003-08-19 08:40
Message:
Logged In: YES
user_id=552329
You can now try Skip's n-way.py script in the contrib
directory in cvs, which does the cascade of n spambayes
classifiers as Tim describes.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2003-08-06 00:51
Message:
Logged In: YES
user_id=31435
The spambayes engine doesn't really know anything
about "ham" and "spam", it starts for each user as a blank
slate and knows only what they teach it. So in that sense it
could be more general, although the tokenization strategies
were specifically designed for, and tuned on, the ham-vs-
spam task.
A higher hurdle is that spambayes is "deeply" a binary (two-
way) classifier. Classic Bayesian classifiers are N-way, but
spambayes threw away that part of the math, replacing it
with a two-way classifier based on very different theory.
I know at least one person tried to do N-way classification by
training N distinct spambayes classifiers, and running them in
sequence. That was quite a while ago, and I don't know how
happy they were with the results.
In any case, trying to hack the Outlook plugin will have you
crawling on your belly for a month <wink -- but dealing with
Outlook is massively complicated>.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=783846&group_id=61702
More information about the Spambayes-bugs
mailing list