[Spambayes] Stemming and stopword elemination
Alexander at Leidinger.net
Fri Jan 17 13:47:41 EST 2003
has someone already experimented with Information Retrieval techniques
like stopword elemination (stopwords: the, a, an, or, and, ...) and word
See http://www.tartarus.org/~martin/PorterStemmer for a description of
the algorithm for english text and a python implementation, or
http://snowball.tartarus.org/ for non-english stemmers.
I don't think this will change the failure rate significantly (maybe
better results with few training data, maybe worser; I don't expect
much change with large training data), but it should reduce the size of
the needed database.
I believe the technical term is "Oops!"
http://www.Leidinger.net Alexander @ Leidinger.net
GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7
More information about the Spambayes