Charles Cazabon python-spambayes@discworld.dyndns.org
Wed Nov 6 20:16:09 2002

Tim Peters <tim.one@comcast.net> wrote:
> It will also create a database size problem:  without a strategy for pruning
> useless words, the database will grow without bounds (an intuition that at a
> certain non-fantastic size, "all words" will have been seen is incorrect for
> computer-based indexing apps, and especially for email -- unique words keep
> appearing and keep bloating the beast).

Did you actually find this?  I found the growth tailed off dramatically after
not too long.  I no longer have the exact numbers, but database growth for me
tailed off almost to nothing after I had trained on something like 1500

