[spambayes-dev] testing tweaks
T. Alexander Popiel
popiel at wolfskeep.com
Thu Aug 7 12:56:59 EDT 2003
In message: <20030807182048.A95B516F18 at jmason.org>
jm at jmason.org (Justin Mason) writes:
><delurk> Hey SBers,
>Have you guys considered testing how a tweak effects DB size -- ie. including
>that in the test results output? I find that's a pretty major factor in
>a lot of cases in SpamAssassin. </delurk>
We've looked at DB size a couple times in the past, but some of the
complicating factors of this are that the actual DB size (as opposed
to token counts) is highly dependent on what sort of backend you use,
and people have very different thresholds for what is acceptable
space usage. As a result, it's very difficult to get any consensus
on what sort of DB size behaviour is acceptable.
Add into the mix that the largest effector of DB size is training
style... and no two of us use the same style, and there's little
support been made for simulating the training styles of different
people for testing. (There's the bare beginning of a framework in
the incremental stuff I did, but there's insufficient training rules
built for simulating different styles.)
I personally am happy to give a couple gigabytes to training data
(aka my historical mail record... I never delete any mail anymore),
and up to about 50 megabytes to the live database (it's currently
bounded at about 20 megabytes by my training style). I'm sure that
Tim's sister would have different priorities.
So yes, we've considered it, but only barely, and not recently.
This is one area where theory has fallen to lassitude.
More information about the spambayes-dev