[Spambayes] training problems

Eric S. Johansson esj at harvee.org
Tue Jul 1 20:49:14 CEST 2008


skip at pobox.com wrote:
>     Eric> My current training process is everything above 0.2 is considered
>     Eric> good and delivered.  Everything below 0.8 is considered bad and
>     Eric> dumpstered.  
> 
> Kind of a large overlap there.  Do you mean everything *below* 0.2 is good
> and everything *above* 0.8 is spam?

whoops. above as in greater than.  that is what I get for writing while 
distracted.  the real code logic is:

on first pass, messages scoring < 0.2 are delivered.  messages scoring >= 0.8 
are dumpstered.  the rest are presented to the user.

after the user determines which message is good or bad, train as follows.

if message is considered good (scoring >= 0.2), train as good

if message is considered bad (scoring < 0.8), train as bad.

for example:

if score >= 0.2:
   log( "train it green %s"% (score,),1)
   # retrain as green because the user said so.
   lock_file = file(configuration_data["sb_lock"], "w+")
   locker = simple_locker.locker()
   locker.lock(lock_file, simple_locker.LOCK_EX)

   bayesian_storage = storage.open_storage( configuration_data['sb_features'], 
"dbm" )
   filter_x = hammie.Hammie(bayesian_storage)
   filter_x.train(tpblue_message.message,False)
   # unlock and close access to lock file
   locker.unlock(lock_file)
   lock_file.close()

   log("TAG  learned %s" % result,1)
   # update yellow limits
   # yellow_limit.do_injection(configuration_data['user_ID'],score)

   # save reputation
   hanky = common_services.reputation_DBM()
   hanky.add_good_reputation(tpblue_message.meta['xforward'], "10")
   hanky = None

   # change state to 'good'
   tpblue_message.alter_state('delivered')


I think I see an unrelated bug.  I release the lock before flushing the 
spambayes data file to disk.



More information about the SpamBayes mailing list