[Spambayes] Dictionary Analysis Tool
Tegels, Kent
Kent.Tegels at hdrinc.com
Thu Dec 11 17:36:03 EST 2003
Excellent. Thank you!
-----Original Message-----
From: Skip Montanaro [mailto:skip at pobox.com]
Sent: Thursday, December 11, 2003 2:30 PM
To: Tegels, Kent
Cc: spambayes at python.org
Subject: Re: [Spambayes] Dictionary Analysis Tool
Kent> I love the SpamClues feature, but I'd really like to know --
by
Kent> "word" what "words" have the highest to lowest probability of
Kent> occurring in my SpamBayes for all messages. Putting it another
Kent> way, I'd like to know what the top N most "spammy" words for
me.
Kent> Is there a tool or other way to do this?
There is a spamcounts.py script in the SpamBayes contrib directory. It
will accept a regular expression to decide what tokens to display. Run
it like
so:
spamcounts.py -r '.*'
and it will dump a CSV file to standard output which contains all the
tokens in your current database. It looks like so:
token,nspam,nham,spam prob
$63.01,1,0,0.844827586207
$1.99,1,0,0.844827586207
from:addr:detik.com,1,0,0.844827586207
four,1,2,0.310046433094
to:addr:ski,1,0,0.844827586207
"advertisers,",1,0,0.844827586207
08:06:09,0,1,0.155172413793
...
You can just pop that into Excel (or other favorite spreadsheet) and
sort by the "spam prob" column or feed it into a Python script which
uses the csv module to load it back up, sort it, then display the N rows
with the highest spam prob.
Skip
More information about the Spambayes
mailing list