[Spambayes] Re: use as generic classifier? (not just spam/ham)

Gary Robinson grobinson at transpose.com
Fri Jan 31 12:45:04 EST 2003


I've been thinking about that myself. I wonder if anyone else on this list
has been?

There are traditional classification techniques for classifying documents.
Each document becomes a vector in n-space, and space is partitioned
according to subject area. But that technology is typically complex and
expensive.

it would be interesting to take the whole spambayes spam/ham approach, and
run it separately for every folder. Just do the same exact thing, but
instead of the target being a junk mail folder, it would be a folder of some
subject matter.

So for each folder, instead of spam vs. nonspam, <that folder> vs. <not that
folder> would be calculated. Then, the email would be allocated to the
folder with the greatest certainty would be picked.

That is, I wouldn't try to adapt the approach to work with more than a
binary choice at a time. I don't think that would work (at least not with
chi-square). But if you calculate the binary choice w/r/t each folder, the
one with the most certainty can be chosen.

It's a little bit ugly, if it gets the job done, so what.

I think it would work. It might not work quite as well as in the spam/ham
case, but it would be interesting to actually try it and see how well it
did.

It seems like the mods to spambayes to make it happen wouldn't be
super-huge; if enough people felt that there might be some benefit, it seems
like it might be worth doing...

Just my 2 cents!


> 
> I was wondering if spambayes could be used in a different way. Currently
> I have a couple of users on a mailserver (imap+maildir). I'd like to
> make it possible to have them create different subfolders where they
> move mail to (from a mailinglist, about a specific subject, from a
> specific person - whatever) through their standard imap client. Now,
> when a new mail arrives on the mailserver, I'd like that mail to be
> "scored" against all these folders and subfolders (maildirs), and for it
> to be delivered to the folder that corresponds to the system's "best
> guess". If the scores for all folders are within a certain (smallish)
> range (percentage), the system's best guess is not good enough and it
> should just be delivered to the standard inbox.


--Gary

-- 
[http://ThisURLEnablesEmailToGetThroughOverzealousSpamFilters.org]

Gary Robinson
CEO
Transpose, LLC
grobinson at transpose.com
207-942-3463
http://www.transpose.com
http://radio.weblogs.com/0101454





More information about the Spambayes mailing list