[spambayes-dev] RE: [Spambayes] Watch out for digests...

Kenny Pitt kennypitt at hotmail.com
Thu Dec 11 09:58:45 EST 2003


Skip Montanaro wrote:
> I'd be interested to see what others' hapax fractions are:
> 
>     >>> import shelve
>     >>> db = shelve.open(".hammiedb")
>     >>> n = 0
>     >>> len([k for k in db if db[k] in [(0,1),(1,0)]])
>     7731
>     >>> len(db)
>     9769
>     >>> len([k for k in db if db[k] in
[(0,1),(1,0)]])/float(len(db)-1) 
>     0.79146191646191644

My current Outlook training database has 40 good and 59 spam.  Here are
my results:

>>> len([k for k in db if db[k] in [(0,1),(1,0)]])
8158
>>> len(db)
11274
>>> len([k for k in db if db[k] in [(0,1),(1,0)]])/float(len(db)-1)
0.72367604009580411

-- 
Kenny Pitt




More information about the spambayes-dev mailing list