[Spambayes] Performance of CVS Outlook addin
Mark Hammond
mhammond at skippinet.com.au
Thu May 22 13:17:04 EDT 2003
{Tim]
> near 2MB yet, so I'm not much help. What is the slowdown
> proportional to,
> roughly? To the number of tokens in the message just trained
> on, or (as it
> would be in the case of a dict) the number of tokens in the
> database? I was
> sure hoping that under "a real" database, "# of tokens in the
> msg" would be
> the answer, *and* that updating a few hundred token records
> would go too
> fast to notice.
Me too :) It certainly is proportional to the number of tokens, but I also
suspect it depends on the "layout" of the DB. I have never seen an
incremental save take longer than 1 second on my (fairly fast CPU, average
disks) machine - but 1 second is borderline too slow. My DB is currently
5MB.
Certainly saving the database after a complete retrain takes nearly a
minute. I was going to look at removing all hapaxes after a complete
retrain to try and speed that one up too (as I recall a report here that a
huge number of the tokens were hapaxes - I am yet to confirm that with my
database)
Mark.
More information about the Spambayes
mailing list