[Spambayes] How low can you go?

Tony Meyer tameyer at ihug.co.nz
Thu Dec 18 03:08:43 EST 2003


[Tim]
> I see that it's a cruder approximation to the suggested 
> scoring algorithm (which I implemented at one time).
[...]
> It's harder to code a tiling method;

Exactly <wink>.

> BTW, it should *not* be necessary to increase 
> max_discriminators, and doing so can create subtle numeric
> problems in the inverse chi-squared function.
> Without this option, in an N-token message, N tokens were 
> candidates for scoring; with this option, there are still
> exactly N candidates for scoring; with a true tiling
> implementation, there are no more than N 
> candidates for scoring (and usually less than N).

So the comment in here:
<http://mail.python.org/pipermail/spambayes-dev/2003-September/001005.html>
Is only referring to cases where both unigrams *and* bigrams are used,
rather than the tiling (or crude approximation) is used?

I did get improvements with a higher max_discriminators:
<http://mail.python.org/pipermail/spambayes-dev/2003-September/001018.html>
Is that likely to be just a side-effect of the crudeness of my
approximation?

=Tony Meyer




More information about the Spambayes mailing list