[spambayes-dev] An unrelated idea: categorization /
clusteranalysis of text files for FAQ generating
skip at pobox.com
Thu Sep 25 10:23:17 EDT 2003
>> Could you automatically categorize a set of messages into an optimum
>> number of cluster subsets, where messages inside a subset would be
>> similar to each other, in bayesian filtering terms. If this could be
>> done without a priori manually selecting the categories that the
>> clusters subset are, this could be used for an automated "frequently
>> asked questions" list manitenance. Automatic categorization of
>> incoming mail without manually choosing any criteria beforehand would
>> also be interesting.
Kenny> Sounds like you might want to look at POPFile, another
Kenny> open-source mail filter that implements at least some of the
Kenny> concepts you describe.
If you're interested in playing around with Python or n-way classification,
you might take a look at contrib/nway.py in the SpamBayes distribution.
There's a hopefully decent docstring at the top of the file which shows how
to use it.
More information about the spambayes-dev