[spambayes-dev] Reduced training test results
T. Alexander Popiel
popiel at wolfskeep.com
Fri Dec 26 13:44:21 EST 2003
In message: <16364.18236.225460.401395 at montanaro.dyndns.org>
Skip Montanaro <skip at pobox.com> writes:
>
> Alex> Also of significant interest is that the classifier doesn't seem
> Alex> to decay as badly over time. With training on everything, the
> Alex> unsure rate in particular (and fn to a much lesser extent) goes up
> Alex> significantly after about 200 days worth of traffic, though the fp
> Alex> rate stays low. With just training on those things that aren't
> Alex> already certain, the unsure rate climbs much more slowly after 200
> Alex> days (with the cumulative rate staying relatively flat), while the
> Alex> fp and fn rates stay at very low values.
>
> Alex> Details of my experiment parameters:
>
> Alex> I've got about 77000 messages in my dataset, covering a span of
> Alex> 418 days. Of these, about 21500 are ham, and nearly 56000 are spam.
> Alex> I include virus/worm messages in my spam, and the "latest windows
> Alex> update" worm makes its presence felt around day 360.
>
>Is it possible that the ham/spam ratio isn't as bad when you don't train on
>everything?
Eyeballing the graphs, it seems that the ratio is slightly _more_
unbalanced for the nonedge regime, rather than less.
Also, from looking closer at the 7-day span graphs, I see that the
inflection point is at about 120 days, not 200.
- Alex
More information about the spambayes-dev
mailing list