[spambayes-dev] Re: [Spambayes] Database cleaning?
T. Alexander Popiel
popiel at wolfskeep.com
Sun Jun 1 21:46:21 EDT 2003
In message: <1054430548.31.1335 at sake.mondoinfo.com>
Matthew Dixon Cowles <matt at mondoinfo.com> writes:
>
>I tore that code out and instead hacked the classifier so that I
>could determine how soon after a word figures in scoring that it's
>used again. I think that the results are at least slightly
>interesting. Note that the histogram below is log scaled.
[ snip of histogram showing an apparent exponential
dropoff in usage frequency ]
Yes, this is a very interesting result. I'm not sure it's
actually useful, but it is pretty.
Another thing that would be interesting to plot would be a histogram
of the average frequency each token gets used at... which might give
us some idea of how large a DB is actually useful.
- Alex
More information about the spambayes-dev
mailing list