[Spambayes] How to Display tokenized ham/spam scores?

Jake jake.angulo at bigfoot.com
Tue Aug 19 20:14:53 EDT 2003


Hello Joerg,

Thanks for your initial tip, however, i am not very python-literate, and i
don't have a way of opening the hammie.db format.  (i did try to read the
code-comments and are very understandable, but i can't imagine the final
output of after the classifier calssifies/trains on the tokenizer output).

What i meant exactly is, how could is extract this kind of table information
(or something similar):

word                                    spamprob             #ham
#spam
'*H*'
                     0                           -              -
'*S*'
                     1                           -              -
'jon-'                                  0.0313807              38          1
'from:addr:jonathan'                0.06584             3              0
'noheader:mime-version'             0.267816         3682       1332
'there'                                 0.357648                 1865
1027
'web'                                   0.359379             1678        931
'noheader:reply-to'                 0.398404             8311       5444
'reply-to:none'                     0.398404             8311       5444
'your'                                  0.607781             3493       5354
'now'                                   0.609287             1198       1848
'header:Date:1'                     0.614892             5565       8789
'header:From:1'                     0.616075             5536       8787
'live'                                  0.617098                  227
362
'subject:Jon'                       0.628519              123            206
"you've"                            0.635875              294            508
'potential'                             0.637791              171        298
'header:Received:6'                 0.639839          738       1297
'url:com'                               0.643368             3651       6515
'must'                                  0.651722              330        611



I just got the above table from another spambayes discussion somewhere else.
The way i understand it, the hammie.db stores this type of information?  Am
not so sure about this.  All i know is that, spambayes works :)

Thanks!

---jake


-----Original Message-----
From: Joerg Beyer [mailto:job at webde-ag.de]
Sent: Tuesday, August 19, 2003 5:58 PM
To: Jake
Cc: SpamBayes at python.org
Subject: Re: [Spambayes] How to Display tokenized ham/spam scores?

Jake wrote:
> Hello there,
>
> How can i display the actual ham/spam scoring for words/tokens
> ble)? --- the ones that get written into the hammie.db for
> classification.
for the dbm version of the stored you can do this:
open the dbm file, iterate over the keys (which is the token)
for each key extract a python object, which is a pickled
object (for most cases a 2-tuple (ham and spam count
for the key, sometimes a 3-tuple, but I dont know yet why)
So you can extract the ham/spam count for each token (roughly
a token is a word from a mail plus special words, like how
many entries have been in the to: and cc: filed of the header).
> Am interested on how the algorithm works exactly.
read the source, it is very annotated whith comments that
say, why something is done.
hope this helps
Joerg




More information about the Spambayes mailing list