[Spambayes] Moving a SpamBayes Database between platforms?
skip at pobox.com
Mon Jun 2 16:58:48 EDT 2003
Hadar> I am running the Outlook plugin, and so far getting "mixed"
Hadar> results. I think that this is due to the fact that I collect
Hadar> email for _many_ addresses through one POP3 account. Meaning,
Hadar> the other accounts are all aliases or forwards into the one pop
Hadar> As an example, if I get three identical spam emails sent to me at
Hadar> three distinct addresses, one is caught by SpamBayes, and the
Hadar> other two are not. They're rarely tagged as "Possible Spam"
I wouldn't think that would be a problem. I have a similar setup. I'm
chief cook and bottle washer for the mojam.com and musi-cal.com domains. I
also get mail destined for skip at pobox.com. Consequently, I get multiple
copies (often 5-10 copies) of most spam. I do filter some of it out with a
message-id filter in my procmailrc file:
# make sure we don't get two copies of the same message
:0 Wh: msgid.lock
| /usr/bin/formail -D 196608 $HOME/tmp/msgid.cache
but I still tend to get multiples of lots of stuff. I've never noticed that
it nails one copy but misses others. What are your spam and ham cutoffs set
to? Sounds like you might be too tight (close to 1.0) on the spam cutoff.
Hadar> Ignore that for now, as it is getting every-so-slightly better
Hadar> each day, but I get _so much spam_ that this should only be a
Hadar> matter of a few more weeks before I'm happy with the ratio.
I have the SpamAtBay beta loaded on my Windows machine at the moment. I
have a rule in my procmailrc file for testing which just copies everything
to the email address Outlook is set up to read. It started nailing spam
almost from the get-go. It's not perfect yet. There are still a lot of
unsures and the occasional false negative. Haven't seen any false positives
yet that I can recall.
Hadar> I would like to run all of the training on my client, via the
Hadar> Outlook plugin. Then, on occasion I would like to "upload the
Hadar> database" to my server, and use the "static" database to filter
Hadar> via procmail.
Hadar> Is this straightforward to do, meaning, are the databases
Hadar> interoperable? Outlook 2000 running on Windows 2000 Professional,
Hadar> with Procmail running on RH Linux 8.0 on the server side.
Maybe, if you are careful to run the same version of the underlying Berkeley
DB on both machines. I just copied
c:\Documents And Settings\Administrator\Application Data\SpamBayes\
file from Windows to my Mac OS X system then tried to open it. It failed,
but for a reasonable reason. I am apparently still running too old a
version of Berkeley DB on my Mac. It complained:
>>> db = bsddb.hashopen("default_bayes_database.db")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/Users/skip/local/lib/python2.3/bsddb/__init__.py", line 162, in hashopen
d.open(file, db.DB_HASH, flags, mode)
bsddb._db.DBInvalidArgError: (22, 'Invalid argument -- default_bayes_database.db: unsupported hash version: 8')
Getting your two Berkeley DB versions in sync is just the first step. You
then need to figure out if the structure of the values in the two files is
the same. I can't answer that for you at the moment. I suspect someone
else can though.
More information about the Spambayes