[spambayes-dev] An unrelated idea: categorization / clusteranalysis of text files for FAQ generating

Kenny Pitt kennypitt at hotmail.com
Thu Sep 25 09:43:58 EDT 2003

Anssi Porttikivi wrote:
> Could you automatically categorize a set of messages into an optimum
> number of cluster subsets, where messages inside a subset would be
> similar to each other, in bayesian filtering terms. If this could be
> done without a priori manually selecting the categories that the
> clusters subset are, this could be used for an automated "frequently
> asked questions" list manitenance. Automatic categorization of
> incoming mail without manually choosing any criteria beforehand would
> also be interesting.

Sounds like you might want to look at POPFile, another open-source mail
filter that implements at least some of the concepts you describe.  It
allows you to define as many "buckets" as you want for sorting your
mail, and then you train it just like any other Bayesian-based filter as
messages arrive.

Check it out at http://popfile.sourceforge.net.  They have SourceForge
discussion forums that would probably be a good place to discuss your

Kenny Pitt

More information about the spambayes-dev mailing list