[Spambayes] Moving a SpamBayes Database between platforms?

Skip Montanaro skip at pobox.com
Mon Jun 2 16:58:48 EDT 2003


    Hadar> I am running the Outlook plugin, and so far getting "mixed"
    Hadar> results. I think that this is due to the fact that I collect
    Hadar> email for _many_ addresses through one POP3 account.  Meaning,
    Hadar> the other accounts are all aliases or forwards into the one pop
    Hadar> account.

    Hadar> As an example, if I get three identical spam emails sent to me at
    Hadar> three distinct addresses, one is caught by SpamBayes, and the
    Hadar> other two are not. They're rarely tagged as "Possible Spam"
    Hadar> either.

I wouldn't think that would be a problem.  I have a similar setup.  I'm
chief cook and bottle washer for the mojam.com and musi-cal.com domains.  I
also get mail destined for skip at pobox.com.  Consequently, I get multiple
copies (often 5-10 copies) of most spam.  I do filter some of it out with a
message-id filter in my procmailrc file:

    # make sure we don't get two copies of the same message
    :0 Wh: msgid.lock
    | /usr/bin/formail -D 196608 $HOME/tmp/msgid.cache

but I still tend to get multiples of lots of stuff.  I've never noticed that
it nails one copy but misses others.  What are your spam and ham cutoffs set
to?  Sounds like you might be too tight (close to 1.0) on the spam cutoff.

    Hadar> Ignore that for now, as it is getting every-so-slightly better
    Hadar> each day, but I get _so much spam_ that this should only be a
    Hadar> matter of a few more weeks before I'm happy with the ratio.

I have the SpamAtBay beta loaded on my Windows machine at the moment.  I
have a rule in my procmailrc file for testing which just copies everything
to the email address Outlook is set up to read.  It started nailing spam
almost from the get-go.  It's not perfect yet.  There are still a lot of
unsures and the occasional false negative.  Haven't seen any false positives
yet that I can recall.

    Hadar> I would like to run all of the training on my client, via the
    Hadar> Outlook plugin. Then, on occasion I would like to "upload the
    Hadar> database" to my server, and use the "static" database to filter
    Hadar> via procmail.

    ...

    Hadar> Is this straightforward to do, meaning, are the databases
    Hadar> interoperable? Outlook 2000 running on Windows 2000 Professional,
    Hadar> with Procmail running on RH Linux 8.0 on the server side.

Maybe, if you are careful to run the same version of the underlying Berkeley
DB on both machines.  I just copied

    c:\Documents And Settings\Administrator\Application Data\SpamBayes\
        default_bayes_database.db

file from Windows to my Mac OS X system then tried to open it.  It failed,
but for a reasonable reason.  I am apparently still running too old a
version of Berkeley DB on my Mac.  It complained:

    >>> db = bsddb.hashopen("default_bayes_database.db")
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
      File "/Users/skip/local/lib/python2.3/bsddb/__init__.py", line 162, in hashopen
        d.open(file, db.DB_HASH, flags, mode)
    bsddb._db.DBInvalidArgError: (22, 'Invalid argument -- default_bayes_database.db: unsupported hash version: 8')

Getting your two Berkeley DB versions in sync is just the first step.  You
then need to figure out if the structure of the values in the two files is
the same.  I can't answer that for you at the moment.  I suspect someone
else can though.

Skip



More information about the Spambayes mailing list