[Spambayes] How to Display tokenized ham/spam scores?

Tue Aug 19 20:14:53 EDT 2003

Hello Joerg,

Thanks for your initial tip, however, i am not very python-literate, and i
don't have a way of opening the hammie.db format.  (i did try to read the
code-comments and are very understandable, but i can't imagine the final
output of after the classifier calssifies/trains on the tokenizer output).

What i meant exactly is, how could is extract this kind of table information
(or something similar):

word                                    spamprob             #ham
#spam
'*H*'
                     0                           -              -
'*S*'
                     1                           -              -
'jon-'                                  0.0313807              38          1
'from:addr:jonathan'                0.06584             3              0
'noheader:mime-version'             0.267816         3682       1332
'there'                                 0.357648                 1865
1027
'web'                                   0.359379             1678        931
'noheader:reply-to'                 0.398404             8311       5444
'reply-to:none'                     0.398404             8311       5444
'your'                                  0.607781             3493       5354
'now'                                   0.609287             1198       1848
'header:Date:1'                     0.614892             5565       8789
'header:From:1'                     0.616075             5536       8787
'live'                                  0.617098                  227
362
'subject:Jon'                       0.628519              123            206
"you've"                            0.635875              294            508
'potential'                             0.637791              171        298
'header:Received:6'                 0.639839          738       1297
'url:com'                               0.643368             3651       6515
'must'                                  0.651722              330        611

I just got the above table from another spambayes discussion somewhere else.
The way i understand it, the hammie.db stores this type of information?  Am
not so sure about this.  All i know is that, spambayes works :)

Thanks!

---jake

-----Original Message-----
From: Joerg Beyer [mailto:job at webde-ag.de]
Sent: Tuesday, August 19, 2003 5:58 PM
To: Jake
Cc: SpamBayes at python.org
Subject: Re: [Spambayes] How to Display tokenized ham/spam scores?

Jake wrote:
> Hello there,
>
> How can i display the actual ham/spam scoring for words/tokens
> ble)? --- the ones that get written into the hammie.db for
> classification.
for the dbm version of the stored you can do this:
open the dbm file, iterate over the keys (which is the token)
for each key extract a python object, which is a pickled
object (for most cases a 2-tuple (ham and spam count
for the key, sometimes a 3-tuple, but I dont know yet why)
So you can extract the ham/spam count for each token (roughly
a token is a word from a mail plus special words, like how
many entries have been in the to: and cc: filed of the header).
> Am interested on how the algorithm works exactly.
read the source, it is very annotated whith comments that
say, why something is done.
hope this helps
Joerg