[spambayes-dev] RE: [Spambayes] How low can you go?
Skip Montanaro
skip at pobox.com
Mon Dec 22 14:55:41 EST 2003
Seth> I would like to investigate whole message expiration with
Seth> different training and expiration schemes. From our previous
Seth> discussion, it seems that the most flexible way to approach this
Seth> is by going to a system with the several bidirectional maps
Seth> implemented in the databases: feature_id <-> token, msg_id (+
Seth> training timestamp) <-> feature_id and token database w/training
Seth> timestamp per entry. Instead of training timestamp, expiration
Seth> time might be preferable.
I'll just toss out a thought with nothing really to back it up besides my
seat-of-the-pants experience. You might find it easier to experiment with
different table layouts using SQL. There are both MySQL and PostgreSQL
classifiers available (browse spambayes/storage.py). You could add new
tables or new columns to existing tables without much fuss. Also, hapax
expiration would be pretty simple. (Add a last_used column, arrange for it
to get incremented whenever a row is fetched - fairly trivial with
PostgreSQL's triggers I think, then use it to expire hapaxes periodically.)
Finally, problems of multi-thread or multi-process access to the database
should go away.
Skip
More information about the spambayes-dev
mailing list