[spambayes-dev] auto-training w/ small db seems like a bad idea

Skip Montanaro skip at pobox.com
Tue Jan 13 15:13:53 EST 2004


    >> "small DB/non-edge training" may very well be a great idea.  In fact,
    >> I started from scratch yesterday based upon your email praising the
    >> idea.  Auto-training with small databases where you are likely to get
    >> more false positives seems like a bad idea though.

    Eli> Is there a regime that simulates what you had been doing manually?

Dunno.  I haven't really looked at the incremental training stuff.  

    Eli> *nudge*wink*nudge* I've actually been wanting to dig into the
    Eli> incremental.py stuff more; if you could provide more detail about
    Eli> how you choose your inital training set, etc. I'd be happy to try
    Eli> whipping something up when more free time rolls around (likely not
    Eli> until this weekend).

I choose my initial training set by whatever strikes me at the moment.
Generally that means picking a few hams and spams, as few as one of each,
often from the most recent hams and spams which I am about to throw away.
Other times I pick a couple Python messages and a recent spam or two from
one of my spam mailboxes.

As far as I can tell, there's no obvious best way to pick those initial few
messages.

Skip




More information about the spambayes-dev mailing list