[Spambayes] Re: use as generic classifier? (not just spam/ham)

Gary Robinson grobinson at transpose.com
Fri Jan 31 12:45:04 EST 2003

I've been thinking about that myself. I wonder if anyone else on this list
has been?

There are traditional classification techniques for classifying documents.
Each document becomes a vector in n-space, and space is partitioned
according to subject area. But that technology is typically complex and

it would be interesting to take the whole spambayes spam/ham approach, and
run it separately for every folder. Just do the same exact thing, but
instead of the target being a junk mail folder, it would be a folder of some
subject matter.

So for each folder, instead of spam vs. nonspam, <that folder> vs. <not that
folder> would be calculated. Then, the email would be allocated to the
folder with the greatest certainty would be picked.

That is, I wouldn't try to adapt the approach to work with more than a
binary choice at a time. I don't think that would work (at least not with
chi-square). But if you calculate the binary choice w/r/t each folder, the
one with the most certainty can be chosen.

It's a little bit ugly, if it gets the job done, so what.

I think it would work. It might not work quite as well as in the spam/ham
case, but it would be interesting to actually try it and see how well it

It seems like the mods to spambayes to make it happen wouldn't be
super-huge; if enough people felt that there might be some benefit, it seems
like it might be worth doing...

Just my 2 cents!

> I was wondering if spambayes could be used in a different way. Currently
> I have a couple of users on a mailserver (imap+maildir). I'd like to
> make it possible to have them create different subfolders where they
> move mail to (from a mailinglist, about a specific subject, from a
> specific person - whatever) through their standard imap client. Now,
> when a new mail arrives on the mailserver, I'd like that mail to be
> "scored" against all these folders and subfolders (maildirs), and for it
> to be delivered to the folder that corresponds to the system's "best
> guess". If the scores for all folders are within a certain (smallish)
> range (percentage), the system's best guess is not good enough and it
> should just be delivered to the standard inbox.



Gary Robinson
Transpose, LLC
grobinson at transpose.com

More information about the Spambayes mailing list