[spambayes-dev] RE: [Spambayes] How low can you go?

Seth Goodman sethg at GoodmanAssociates.com
Mon Dec 22 14:06:14 EST 2003


> [Tim Peters]
> Sorry, I can't make time to reply now.  Your original message is still
> sitting in my queue (actually, several of your msgs are -- you
> write a lot,
> you know <wink>), and I'll get to it when I can.

Sorry about that.  I'll try to keep the noise level down.

> > [Seth Goodman]
> > so I don't want to go off on a useless fork with the present db if
> > that comes to pass.
>
> [Tim Peters]
> Try say something more specific about what you want to investigate, and
> you'll probably get a better answer.

I would like to investigate whole message expiration with different training
and expiration schemes.  From our previous discussion, it seems that the
most flexible way to approach this is by going to a system with the several
bidirectional maps implemented in the databases:  feature_id <-> token,
msg_id (+ training timestamp) <-> feature_id  and token database w/training
timestamp per entry.  Instead of training timestamp, expiration time might
be preferable.

If none of this exists, I guess I need to start there.  I was hoping that
some of this might exist since you are already experimenting with hapax
expiration.  I thought I read that there was experimental code that mapped
tokens to the message_id's they were trained from, but that may have been
wishful thinking.  In any case, these are all significant database changes,
and I was afraid to go off half-cocked if the underlying database was not
going to hang around.  Any advise as to how to proceed would be appreciated.
If this is too ambitious for a first project, please help me pare it down.

--
Seth Goodman

  Humans:   off-list replies to sethg [at] GoodmanAssociates [dot] com

  Spambots: disregard the above




More information about the spambayes-dev mailing list