[Spambayes] incremental testing with CL2/CL3?

Brad Clements bkc@murkworks.com
Sun, 06 Oct 2002 14:43:16 -0400


Someone mentioned they did incremental testing and posted their results, but I couldn't 
figure out what the results meant.

So, I want to try it too.

I notice in the TestDriver, comments like:

    # CAUTION:  this just doesn't work for incrememental training when
    # options.use_central_limit is in effect.
    def train(self, ham, spam):


I'm not planning on using untrain(), so does this comment still apply?

my plan is:

1. Receive 100 (configurable) messages "per day", with a (configurable) percentage of 
those being spam.

2. run the classifier on those messages and make 3 categories: ham, spam, unsure. I 
want to know how many fall into each category on each "day".

3. some percentage (configurable) of each category will be fed back into training each 
"day".

4. Plot fn and fp rate "per day" for .. 30 days (configurable) to show how rates vary..

5. modulate max_discriminators, training feedback (% of messages in each category 
fed back into system) vs. "days" to get a feel for the results a typical user might expect..

6. re-run testing using new classifier schemes.. 

where do I start?


Brad Clements,                bkc@murkworks.com   (315)268-1000
http://www.murkworks.com                          (315)268-9812 Fax
AOL-IM: BKClements