[Spambayes] filtering in the face of disk quotas or full disks

bill parducci bill at parducci.net
Sat Mar 22 10:27:45 EST 2003

Skip Montanaro wrote:
> This is also not what I'm worried about.  While we need to provide means to
> manage the size of the database, that is essentially an offline activity.
> I'm worried simply about the situation where a mail message arrives and
> there's no disk space left to process it properly.

ok, but to date, this is a *manual* 'offline activity' involving any number of homegrown solutions to resolve. while this is operationally acceptable to advanced users such as those that mind this list, i believe that it is impractical for the vast majority of those who could benefit from this solution (but are unable/unwilling to keeps multiple copies of mail in numerous files, etc.)

> You really can't control the way the database file size grows.  Since it's
> implementing a hash, once the key density gets too high, it expands the
> database dramatically and shuffles things all around.  In between these
> striking leaps in size, the database grows little, if at all, for each new
> key added.

perhaps using the current h architecture, but if you have the ability to maintain the size of the input pool (possibly via a secondary data store that handles raw tokens), then it seems illogical that the size of the db cannot be managed within reason.

> Let me restate the problem: I just don't want Spambayes to be accused,
> rightly or wrongly, of losing mail because a disk quota was exceeded or a
> disk partition filled up.  Everything else is merely an inconvenience.  Lost
> mail can't be recovered.  What motivated this was an (incorrect, in my
> opinion) assumption by a sys admin where I work that because there was a
> failure in a mail setup using procmail and SpamAssassin when the disk quota
> was exceeded that it was obviously a SpamAssassin problem.  

good luck preventing misplaced accusations! :o) 


More information about the Spambayes mailing list