[spambayes-dev] An unrelated idea: categorization / cluster analysis of text files for FAQ generating

Anssi Porttikivi anssi.porttikivi at teleware.fi
Thu Sep 25 09:05:30 EDT 2003

Sorry to bother you, but I would like to know, if anyone here has any
knowledge of technologies like the following idea:

Could you automatically categorize a set of messages into an optimum
number of cluster subsets, where messages inside a subset would be
similar to each other, in bayesian filtering terms. If this could be
done without a priori manually selecting the categories that the
clusters subset are, this could be used for an automated "frequently
asked questions" list manitenance. Automatic categorization of incoming
mail without manually choosing any criteria beforehand would also be

More information about the spambayes-dev mailing list