[spambayes-dev] RE: [Spambayes] Watch out for digests...
Kenny Pitt
kennypitt at hotmail.com
Thu Dec 11 09:58:45 EST 2003
Skip Montanaro wrote:
> I'd be interested to see what others' hapax fractions are:
>
> >>> import shelve
> >>> db = shelve.open(".hammiedb")
> >>> n = 0
> >>> len([k for k in db if db[k] in [(0,1),(1,0)]])
> 7731
> >>> len(db)
> 9769
> >>> len([k for k in db if db[k] in
[(0,1),(1,0)]])/float(len(db)-1)
> 0.79146191646191644
My current Outlook training database has 40 good and 59 spam. Here are
my results:
>>> len([k for k in db if db[k] in [(0,1),(1,0)]])
8158
>>> len(db)
11274
>>> len([k for k in db if db[k] in [(0,1),(1,0)]])/float(len(db)-1)
0.72367604009580411
--
Kenny Pitt
More information about the spambayes-dev
mailing list