[Spambayes] progress on POP+VM+ZODB deployment
T. Alexander Popiel
Sat Oct 26 00:37:39 2002
In message: <Pine.LNX.4.33L2.email@example.com>
Derek Simkowiak <firstname.lastname@example.org> writes:
> I just thought of another argument for a stock "starter.db".
> How can we test out new algorithms if the project doesn't have a
>control group? We have no way of knowing if someone's successful (or
>poor) results are an attribute of the new algorithm, or if it's an
>attribute of their particular sample data.
That's why we have multiple people test anything that looks promising,
and compare the variations across all the different runs. Since the
classifications are reproducable over given corpora, we don't need
control groups in the same way that biological experiments do.
> Having a starter.db would both (a) make life easier for getting
>started, and (b) give us a well-established baseline to test against.
I disagree with (b), because changes in the tokenizer (where I suspect
some of the advances will come from) will invalidate the database.