[spambayes-dev] Reduced training test results

T. Alexander Popiel popiel at wolfskeep.com
Fri Dec 26 13:44:21 EST 2003


In message:  <16364.18236.225460.401395 at montanaro.dyndns.org>
             Skip Montanaro <skip at pobox.com> writes:
>
>    Alex> Also of significant interest is that the classifier doesn't seem
>    Alex> to decay as badly over time.  With training on everything, the
>    Alex> unsure rate in particular (and fn to a much lesser extent) goes up
>    Alex> significantly after about 200 days worth of traffic, though the fp
>    Alex> rate stays low.  With just training on those things that aren't
>    Alex> already certain, the unsure rate climbs much more slowly after 200
>    Alex> days (with the cumulative rate staying relatively flat), while the
>    Alex> fp and fn rates stay at very low values.
>
>    Alex> Details of my experiment parameters:
>
>    Alex> I've got about 77000 messages in my dataset, covering a span of
>    Alex> 418 days.  Of these, about 21500 are ham, and nearly 56000 are spam.
>    Alex> I include virus/worm messages in my spam, and the "latest windows
>    Alex> update" worm makes its presence felt around day 360.
>
>Is it possible that the ham/spam ratio isn't as bad when you don't train on
>everything? 

Eyeballing the graphs, it seems that the ratio is slightly _more_
unbalanced for the nonedge regime, rather than less.

Also, from looking closer at the 7-day span graphs, I see that the
inflection point is at about 120 days, not 200.

- Alex



More information about the spambayes-dev mailing list