[Spambayes] Performance of CVS Outlook addin

Mark Hammond mhammond at skippinet.com.au
Thu May 22 13:17:04 EDT 2003


{Tim]
> near 2MB yet, so I'm not much help.  What is the slowdown
> proportional to,
> roughly?  To the number of tokens in the message just trained
> on, or (as it
> would be in the case of a dict) the number of tokens in the
> database?  I was
> sure hoping that under "a real" database, "# of tokens in the
> msg" would be
> the answer, *and* that updating a few hundred token records
> would go too
> fast to notice.

Me too :)  It certainly is proportional to the number of tokens, but I also
suspect it depends on the "layout" of the DB.  I have never seen an
incremental save take longer than 1 second on my (fairly fast CPU, average
disks) machine - but 1 second is borderline too slow.  My DB is currently
5MB.

Certainly saving the database after a complete retrain takes nearly a
minute.  I was going to look at removing all hapaxes after a complete
retrain to try and speed that one up too (as I recall a report here that a
huge number of the tokens were hapaxes - I am yet to confirm that with my
database)

Mark.




More information about the Spambayes mailing list