[spambayes-dev] An unrelated idea: categorization / clusteranalysis of text files for FAQ generating

Skip Montanaro skip at pobox.com
Thu Sep 25 10:23:17 EDT 2003


    >> Could you automatically categorize a set of messages into an optimum
    >> number of cluster subsets, where messages inside a subset would be
    >> similar to each other, in bayesian filtering terms. If this could be
    >> done without a priori manually selecting the categories that the
    >> clusters subset are, this could be used for an automated "frequently
    >> asked questions" list manitenance. Automatic categorization of
    >> incoming mail without manually choosing any criteria beforehand would
    >> also be interesting.

    Kenny> Sounds like you might want to look at POPFile, another
    Kenny> open-source mail filter that implements at least some of the
    Kenny> concepts you describe. 

If you're interested in playing around with Python or n-way classification,
you might take a look at contrib/nway.py in the SpamBayes distribution.
There's a hopefully decent docstring at the top of the file which shows how
to use it.

Skip



More information about the spambayes-dev mailing list