[Spambayes] Corpus modules

François Granger francois.granger@free.fr
Wed Nov 13 13:13:16 2002


on 13/11/02 3:13, Tim Stone - Four Stones Expressions at
tim@fourstonesExpressions.com wrote:

> Hammie has interesting PersistentBayes and DB_Dict classes, with some helper
> functions for bayes object creation.  It seems to me that a more cogent class
> hierarchy is called for, with Bayes being the abstract class, PersistentBayes
> being an abstract subclass, and subclasses of that for particular persistence
> mechanisms, like PickleBayes, ZODBBayes, DBDictBayes, etc. etc.

I was thinking of hacking the DB mechanisme to split the load between two
databases (using anydbm) to reduce access to each one and to make them more
accessible from outside. The scoring module needs only the second one. The
training module would update both. I suspected that a major redesign was
underway. Here the proposed split.
{'word': ['ltime',     # when this record was last modified
          'spamcount', # of spams in which this word appears
          'hamcount',  # of hams in which this word appears
         ]
}
{'word': ['atime',     # when this record was last used by scoring(*)
          'killcount', # of times this made it to spamprob()'s nbest
          'spamprob',  # prob(spam | msg contains this word)
          ]
}

A 'dirty' flag could be added to the first so that a batch update of the
second would recalculate only the dirty records.

-- 
Le courrier est un moyen de communication. Les gens devraient
se poser des questions sur les implications politiques des choix (ou non
choix) de leurs outils et technologies. Pour des courriers propres :
<http://marc.herbert.free.fr/mail/> -- <http://minilien.com/?IXZneLoID0>




More information about the Spambayes mailing list