[spambayes-dev] auto-training w/ small db seems like a bad idea
Skip Montanaro
skip at pobox.com
Tue Jan 13 15:13:53 EST 2004
>> "small DB/non-edge training" may very well be a great idea. In fact,
>> I started from scratch yesterday based upon your email praising the
>> idea. Auto-training with small databases where you are likely to get
>> more false positives seems like a bad idea though.
Eli> Is there a regime that simulates what you had been doing manually?
Dunno. I haven't really looked at the incremental training stuff.
Eli> *nudge*wink*nudge* I've actually been wanting to dig into the
Eli> incremental.py stuff more; if you could provide more detail about
Eli> how you choose your inital training set, etc. I'd be happy to try
Eli> whipping something up when more free time rolls around (likely not
Eli> until this weekend).
I choose my initial training set by whatever strikes me at the moment.
Generally that means picking a few hams and spams, as few as one of each,
often from the most recent hams and spams which I am about to throw away.
Other times I pick a couple Python messages and a recent spam or two from
one of my spam mailboxes.
As far as I can tell, there's no obvious best way to pick those initial few
messages.
Skip
More information about the spambayes-dev
mailing list