[Spambayes] Re: Guidance re pickles versus DB for Outlook

David Bolen db3l@fitlinxx.com
Tue Nov 26 22:34:24 2002


Skip Montanaro <skip@pobox.com> writes:

> For most of us who have *any* experience with ZODB it's probably all
> indirect via Zope, so there are probably some inaccurate perceptions about
> it.  These thoughts that have come to my mind at one time or another:

Just so you know there are at least some other experiences out there,
in our case we've been using ZODB as the persistant storage for a
scheduling system of ours since late 2000, and have never used Zope
itself, other than for an installation from which to manually extract
ZODB (we were using it before there was a standalone package).

>     * How could a database from a company (Zope) whose sole business is not
>       databases be more reliable than a database from organizations whose
>       sole raison d'etre is databases (Sleepycat, Postgres, MySQL, ...)?

Since the default FileStorage back-end is really just a bunch of
concatenated object pickles with enough meta data to skip around the
file and mark transaction boundaries, it's not really like you need to
compare to a full relational system, nor build in all the capabilities
such a system requires.  And of course, that's just the FileStorage
back-end.

In our experience the default FileStorage has proven very resilient,
although we do have a regular task that backs up and packs the
database.  But I don't think we've ever had one fail to load -
probably lost a final uncommitted transaction a time or two but that's
to be expected, and things were still consistent.

Our packing is also because we're constantly modifying lots of the
persistant objects very frequently (as the scheduler runs) - I expect
things would grow more gradually with spambayes, much as the current
pickle tends to stabilize, and only during training.

>     * Dealing with Zope's monolithic system is frustrating to people (like
>       me) who are used to having files reside in filesystems.  Some of that
>       frustration probably carries over to ZODB, though it's almost
>       certainly not ZODB's problem.

That has little to do with ZODB - we've always used ZODB directly and
just consider it a way to persist our application objects (virtually)
transparently.

>     * It seems to grow without bound, else why do I need to pack my Data.fs
>       file every now and then?

That's specific to FileStorage - different back-ends can handle things
differently.  Most of the growth with FileStorage is to handle
transactions and rollbacks (it just keeps appending and ends up having
"deleted" copies of objects around).  But the appending also means
that it's rarely re-writing lots of older data, which helps with the
robustness.

> It doesn't really matter if the perceptions are accurate or not.  They still
> need to be addressed to some extent before people are going to be
> comfortable with it.  ZODB is, for better or for worse, tied to Zope the
> application.  Accordingly, perceived problems with Zope will rub off on
> ZODB.

I'm not totally sure I agree - we're talking about ZODB being a
behind-the-scenes back-end to spambayes.  It's quite possible that
many users (at least end users as opposed to developers) would never
even think about how the information is being stored behind the
scenes, nor care as long as it worked.

One of the interesting thoughts for ZODB that I haven't seen mentioned
here would be the possibility of using ZEO to permit multiple clients
to share the same database transparently.  For me I might use that to
ensure when I read with Outlook at work versus home that I have access
to the same training data, but maybe later enhancements could key the
database data (either ham only or everything) by the executing user,
and permit the data itself to be stored centrally for a community of
users (and thus backed up and packed if necessary by an administrator).

-- David





More information about the Spambayes mailing list