[Spambayes] I have some questions....

Meyer, Tony T.A.Meyer at massey.ac.nz
Tue Oct 7 23:12:10 EDT 2003


> I used to be getting postings from people who are able to acquire
> a database of word count probabilities...   Something like this...
> 
> penis    .997
> mortgage .887
[...]
> Where there is a word,  then a decimal number from 0 - 1,  giving it
> a probability....
[...]

If you want results for the entire database, then you can use the
sb_dbexpimp.py tool in the scripts directory to export the database to a
'flat' text file.  Note that this won't include the score, but will
include the number of times each token has been seen in ham or spam.  If
you want to generate the score, you can get the formula from
classifier.py, and work it out from that (in Excel, for example).

If you want the scores for particular words, then the web interface has
a "Word Query" box that will give you this.  In 1.0a6 you can put a '*'
at the end of a word as a wildcard (it will give a maximum of 10
results, although you can edit UserInterface.py to change this).  If you
use current CVS, there is an 'Advanced Find' that you can enable that
will let you change the maximum number of results, do regular expression
searches, and so on (you could get the entire database this way, too,
but setting the maximum number of results to an impossibly huge number
and doing a wildcard search for "*").

If you want the clues for a particular message, you can get this by
either enabling the 'evidence' header, by pasting the message into the
"Classify message" box on the web interface, or by using the "Show
Clues" link in the review page.  If you want the tokens for a particular
message, current cvs will offer these from the review page, too.

=Tony Meyer



More information about the Spambayes mailing list