[spambayes-dev] Reduced training test results

Skip Montanaro skip at pobox.com
Fri Dec 26 09:35:40 EST 2003


    Alex> Also of significant interest is that the classifier doesn't seem
    Alex> to decay as badly over time.  With training on everything, the
    Alex> unsure rate in particular (and fn to a much lesser extent) goes up
    Alex> significantly after about 200 days worth of traffic, though the fp
    Alex> rate stays low.  With just training on those things that aren't
    Alex> already certain, the unsure rate climbs much more slowly after 200
    Alex> days (with the cumulative rate staying relatively flat), while the
    Alex> fp and fn rates stay at very low values.

    Alex> Details of my experiment parameters:

    Alex> I've got about 77000 messages in my dataset, covering a span of
    Alex> 418 days.  Of these, about 21500 are ham, and nearly 56000 are spam.
    Alex> I include virus/worm messages in my spam, and the "latest windows
    Alex> update" worm makes its presence felt around day 360.

Is it possible that the ham/spam ratio isn't as bad when you don't train on
everything? 

Skip



More information about the spambayes-dev mailing list