[Spambayes] training problems
Eric S. Johansson
esj at harvee.org
Tue Jul 1 20:49:14 CEST 2008
skip at pobox.com wrote:
> Eric> My current training process is everything above 0.2 is considered
> Eric> good and delivered. Everything below 0.8 is considered bad and
> Eric> dumpstered.
>
> Kind of a large overlap there. Do you mean everything *below* 0.2 is good
> and everything *above* 0.8 is spam?
whoops. above as in greater than. that is what I get for writing while
distracted. the real code logic is:
on first pass, messages scoring < 0.2 are delivered. messages scoring >= 0.8
are dumpstered. the rest are presented to the user.
after the user determines which message is good or bad, train as follows.
if message is considered good (scoring >= 0.2), train as good
if message is considered bad (scoring < 0.8), train as bad.
for example:
if score >= 0.2:
log( "train it green %s"% (score,),1)
# retrain as green because the user said so.
lock_file = file(configuration_data["sb_lock"], "w+")
locker = simple_locker.locker()
locker.lock(lock_file, simple_locker.LOCK_EX)
bayesian_storage = storage.open_storage( configuration_data['sb_features'],
"dbm" )
filter_x = hammie.Hammie(bayesian_storage)
filter_x.train(tpblue_message.message,False)
# unlock and close access to lock file
locker.unlock(lock_file)
lock_file.close()
log("TAG learned %s" % result,1)
# update yellow limits
# yellow_limit.do_injection(configuration_data['user_ID'],score)
# save reputation
hanky = common_services.reputation_DBM()
hanky.add_good_reputation(tpblue_message.meta['xforward'], "10")
hanky = None
# change state to 'good'
tpblue_message.alter_state('delivered')
I think I see an unrelated bug. I release the lock before flushing the
spambayes data file to disk.
More information about the SpamBayes
mailing list