[spambayes-bugs] [ spambayes-Support Requests-912781 ] Other uses?
SourceForge.net
noreply at sourceforge.net
Wed Mar 10 07:54:45 EST 2004
Support Requests item #912781, was opened at 2004-03-09 15:03
Message generated for change (Comment added) made by rprosser
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498104&aid=912781&group_id=61702
Category: None
Group: None
Status: Closed
Priority: 5
Submitted By: Richard Prosser (rprosser)
Assigned to: Nobody/Anonymous (nobody)
Summary: Other uses?
Initial Comment:
Can the code be readily used in non-email applications? I
need a filter that can be trained across different
categories (rather like POPFile), but for simple text input.
Thanks ...
Richard Prosser
----------------------------------------------------------------------
>Comment By: Richard Prosser (rprosser)
Date: 2004-03-10 12:54
Message:
Logged In: YES
user_id=599403
Thanks for the replies. I really need multiple classification, so
I'll have a look at the nway.py script.
BTW, I also read recently about a researcher who had
discovered a new way to optimise the filtering, but now I
can't find the article :-(
----------------------------------------------------------------------
Comment By: Tony Meyer (anadelonbrin)
Date: 2004-03-10 00:07
Message:
Logged In: YES
user_id=552329
Richie and I have both used the spambayes code for non-
email classification. He was classifying a music database,
IIRC, and I'm working with scripted dialogue. The tokeniser
has to be pretty much recreated from scratch, of course, but
if you want the same sort of chi-squared distribution, then it
should work fine.
With multiple categories, you can use spambayes, although,
as Kenny said, it's not designed for that. Check out the
nway.py script in the contrib directory, which Skip wrote
(although I don't think he actually uses it). I've used this
too, although the results aren't as great as they could be
(although that could be for other reasons). Looking at the
POPfile code and how they do it could very well be a better
option (I keep meaning to do this, but haven't got around to
it).
----------------------------------------------------------------------
Comment By: Kenny Pitt (kpitt)
Date: 2004-03-09 21:04
Message:
Logged In: YES
user_id=859086
Classification is performed based on "tokens" in each input,
and the SpamBayes tokenizer relies on standard e-mail format
to generate tokens that have proven useful for differentiating
spam from good messages. You could probably use the
classifier and storage portions on a different type of input by
writing a different tokenizer routine, but I can't guarantee
that everything is perfectly separated.
Also keep in mind that SpamBayes is only designed to support
two categories. It doesn't matter what the two categories
are as long as you give it the training data to define the
characteristics of each. However, if you need to send things
to 3 or more categories as POPFile does then the SpamBayes
code won't support that.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498104&aid=912781&group_id=61702
More information about the Spambayes-bugs
mailing list