[spambayes-dev] Reduced training test results
skip at pobox.com
Fri Dec 26 09:35:40 EST 2003
Alex> Also of significant interest is that the classifier doesn't seem
Alex> to decay as badly over time. With training on everything, the
Alex> unsure rate in particular (and fn to a much lesser extent) goes up
Alex> significantly after about 200 days worth of traffic, though the fp
Alex> rate stays low. With just training on those things that aren't
Alex> already certain, the unsure rate climbs much more slowly after 200
Alex> days (with the cumulative rate staying relatively flat), while the
Alex> fp and fn rates stay at very low values.
Alex> Details of my experiment parameters:
Alex> I've got about 77000 messages in my dataset, covering a span of
Alex> 418 days. Of these, about 21500 are ham, and nearly 56000 are spam.
Alex> I include virus/worm messages in my spam, and the "latest windows
Alex> update" worm makes its presence felt around day 360.
Is it possible that the ham/spam ratio isn't as bad when you don't train on
More information about the spambayes-dev