[Spambayes] Problems getting started with IMAP...

Tony Meyer tameyer at ihug.co.nz
Tue Apr 20 21:08:26 EDT 2004


> 1) Is it correct to use the "-p" (pickled?) database
>     for this platform? I couldn't get the "-d" option
>     to work and don't quite understand this part of it.

If you have some sort of Python dbm module installed, then you can use the
-d option (which is the default).  For example, if you have bsddb installed
(I think this is usually with most Python 2.3.x installs, but I'm really
only familiar with the Windows one).  In any case, using a pickle will work
fine.

> 2) When I ran with the "-b" option to configure, I
>     could setup the folders,  but I cannot display any
>     stats (I get several error messages displayed, which
>     I can provide, if it helps).

Yes, it would help to have the error messages.

>     Once I run the script
>     with classifying, I cannot access localhost:8080 at
>     all (connection refused). How do I see the stats?

[I presume the 8080 is a typo and you meant 8880, or you've changed the
config to put the ui at 8080]

Good point :)  With the POP3 proxy, the web interface is always running
while the proxies are running, so it's not a problem.  With the IMAP filter,
it's *either* the web interface *or* the filter.  I vaguely recall that
there was a reason for this at the time, but I can't recall what it was.

You could run "imapfilter.py -b" as a separate process and use that to view
the web interface (although I'm not sure how happily the two would play
together in terms of accessing the databases).  Maybe the web interface
should be served if the '-l' option is used?  (I suspect that the above
reason that I can't remember might be something to do with the fact that the
script is short-lived without -l, so there isn't any point in serving the
interface).

I'd want to think about it for another day, but I don't see any reason at
the moment that this shouldn't be the case, so could check that change in
for 1.0b2.

>     Is there a way to train, like what is done for the
>     POP version?

Yes.  You can either use the 'upload' box on the web interface (running with
-b) to upload mbox files (or Outlook Express files, if you have those), or
you can use imapfilter itself.  This is the purpose of the -t switch (if you
just use -c it'll classify/filter, but not train, and vice-versa).

Basically in the configuration, you should be able to select one or more
folders that contain ham, and one or more that contain spam.  These can be
ones that you filter, or not (you might have specific "ham_to_train" and
"spam_to_train" folders, for example, or you might just use your inbox and
spam folder).  The -t option gets imapfilter to run through these folders
and appropriately train any messages it hasn't seen before.

> 3) For filtering, I'm using just the INBOX. For training
>     I have a HAM and SPAM folder and I have an UNSURE
>     folder. What procedures do I follow, when I get e-mail
>     that is tagged incorrectly? The cases I've seen so far
>     are messages marked as unsure, which are either spam
>     or ham, and messages marked as ham, which are spam.

Any mistakes should be put in the correct training folder.  The mail is
tagged with an id header that lets SpamBayes correct any mistraining, too,
so you can move from the spam training folder to the ham training folder
quite happily.

>     I move the message in the training folder, but do I
>     leave it there, or do I wait 10 minutes and then delete
>     it?  Once trained, can I move it back to the INBOX?
>     I like to have all e-mail that I've read, but haven't
>     responded to, left in my INBOX.

You can move/delete the messages out of the folder (and this may help speed,
as below) once they have been trained.

There are also two [imap] options: 'move_trained_ham_to_folder' and
'move_trained_spam_to_folder' which may be of use (I can't recall offhand if
these are exposed via the user interface or not.  If not, then you'd need to
manually edit your configuration file).  These do pretty much what the names
suggest - after mail has been trained, it gets moved to the specified
folder.  For example, you could set 'move_trained_ham_to_folder' to your
Inbox.  It does help (in terms of speed) to keep the training folders
relatively small, because the filter needs to wade through them to figure
out if there's anything new there.

> 4) I'm seeing ham messages in my INBOX that, when I move
>     them to other folders on the server, they often will
>     reappear in my INBOX (as unread messages). What is
>     going on?

I don't know :)  I suppose it's possible that SpamBayes is in the middle of
processing that message when you move it, but that seems unlikely.  Does the
message that you move have the SpamBayes headers?  Does the new one?  It
might help if you run imapfilter with "-i4", capturing the output, during a
time when this happens.  This produces a log of the IMAP conversation and we
could use it to debug the problem.  If so, be sure to edit the file before
you send it here, because it includes the IMAP username and password in
clear text.

>     I also had one case where a message was in
>     my INBOX, I read it and replied to the sender. I read
>     other messages in the INBOX and other folders (I have
>     the IMAP server filter some messages directly to
>     folders, so those are skipped by this process). I
>     may have possibly done an Empty trash. When I looked
>     back at the INBOX a few minutes later, the message
>     was gone!  Ouch!

I presume it was filtered and classified as unsure/spam?  It should
therefore have appeared in one of those folders (although the original is
still there marked to be deleted, unless you did do the empty trash).  This
is one of the problems with the filtering approach - because the filter only
executes every x minutes, in times between execution there are messages in
your folders that wouldn't normally be there.  Reducing x might help, if
that doesn't put too much load on things.

> 5) I think I read on the list archive about messages are
>     marked for deletion, but I don't seem to see classified
>     messages that have been moved (copied?) to, say a SPAM
>     folder by Spambayes. Is there some configuration that
>     I need to set in my client to see these messages? Is
>     there something I should be doing differently, when
>     configuring/running Spambayes?

Do you mean that messages classified as spam don't appear in your spam
folder?  This would be bad, and the -i4 output mentioned above would
probably be needed to identify the problem.  If that's not what you mean,
could you clarify?

The 'marked for deletion' comments are talking about the way the filter
moves/filters messages.  Basically, IMAP doesn't provide any method for
moving messages (a terrible flaw in the spec, if you ask me), so the filter
has to create a new copy of the message, and mark the old one for deletion.
So there'll always be a (to be deleted) copy of every message that SpamBayes
(successfully) processes.  This even applies to ham messages, which are
'copied' to the same folder, so that the SpamBayes headers get added
(because IMAP also doesn't let you modify an existing message, other than
the flags).

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.




More information about the Spambayes mailing list