[Spambayes] Problems getting started with IMAP...

Wed Apr 21 07:37:29 EDT 2004

I'm keeping open questions/issues and adding more
responses in-line...

Paul Michali wrote:

>>> 2) When I ran with the "-b" option to configure, I
>>>    could setup the folders,  but I cannot display any
>>>    stats (I get several error messages displayed, which
>>>    I can provide, if it helps).
>>
>>
>>
>> Yes, it would help to have the error messages.
> 
> 
> Here's what the browser shows, when I pick the "More
> Statistics..." link:
> 
> 500 Server error
> 
> Traceback (most recent call last):
> 
>   File "/users/pcm/lib/python/spambayes/Dibbler.py", line 461, in 
> found_terminator
>     getattr(plugin, name)(**params)
> 
>   File "/users/pcm/lib/python/spambayes/UserInterface.py", line 1016, in 
> onStats
>     s = Stats.Stats()
> 
>   File "/users/pcm/lib/python/spambayes/Stats.py", line 42, in __init__
>     self.CalculateStats()
> 
>   File "/users/pcm/lib/python/spambayes/Stats.py", line 58, in 
> CalculateStats
>     for msg in msginfoDB.db.keys():
> 
> AttributeError: 'NoneType' object has no attribute 'keys'
> 
> 

>> Good point :)  With the POP3 proxy, the web interface is always running
>> while the proxies are running, so it's not a problem.  With the IMAP 
>> filter,
>> it's *either* the web interface *or* the filter.  I vaguely recall that
>> there was a reason for this at the time, but I can't recall what it was.
>>
>> You could run "imapfilter.py -b" as a separate process and use that to 
>> view
>> the web interface (although I'm not sure how happily the two would play
>> together in terms of accessing the databases).  Maybe the web interface
>> should be served if the '-l' option is used?  (I suspect that the above
>> reason that I can't remember might be something to do with the fact 
>> that the
>> script is short-lived without -l, so there isn't any point in serving the
>> interface).
> 
> 
> The stats would be nice, and training on incoming messages,
> like the POP3 one would be awesome!

It "seems" like I can run a second instance of sb_imapfilter.py
with the -b option. I'm not sure if they interact.

So, one open item is that I can't seem to see stats when I run
with "-b" option. Error messages shown above.

>> I'd want to think about it for another day, but I don't see any reason at
>> the moment that this shouldn't be the case, so could check that change in
>> for 1.0b2.
>>
>>
>>>    Is there a way to train, like what is done for the
>>>    POP version?
>>
>>
>>
>> Yes.  You can either use the 'upload' box on the web interface 
>> (running with
>> -b) to upload mbox files (or Outlook Express files, if you have 
>> those), or
>> you can use imapfilter itself.  This is the purpose of the -t switch 
>> (if you
>> just use -c it'll classify/filter, but not train, and vice-versa).
> 
> 
> Hmm. With IMAP though, when I look into my mail directory on
> my machine, I don't see the full messages. It almost looks
> like some kind of index or summary? Would I need to move the
> messages to a local folder and then "upload" them? I'll
> probably try to use the -t (but see below for some problems).
> 
> 
>>
>> Basically in the configuration, you should be able to select one or more
>> folders that contain ham, and one or more that contain spam.  These 
>> can be
>> ones that you filter, or not (you might have specific "ham_to_train" and
>> "spam_to_train" folders, for example, or you might just use your inbox 
>> and
>> spam folder).  The -t option gets imapfilter to run through these folders
>> and appropriately train any messages it hasn't seen before.
>>
>>
>>> 3) For filtering, I'm using just the INBOX. For training
>>>    I have a HAM and SPAM folder and I have an UNSURE
>>>    folder. What procedures do I follow, when I get e-mail
>>>    that is tagged incorrectly? The cases I've seen so far
>>>    are messages marked as unsure, which are either spam
>>>    or ham, and messages marked as ham, which are spam.
>>
>>
>>
>> Any mistakes should be put in the correct training folder.  The mail is
>> tagged with an id header that lets SpamBayes correct any mistraining, 
>> too,
>> so you can move from the spam training folder to the ham training folder
>> quite happily.
> 
> 
> OK. So I got a message for prescription drugs (love those :^).
> It was marked as "unsure" and was placed into my Unsure
> folder. I moved it to my Spam folder (used for training).
> Now it just sits there and the classification never changes
> to "spam". I have to move it to my inbox and then it will
> get reclassified and moved to the spam folder.

I'll see if the "move_trained_{ham,spam}_to_folder" will
solve this issue. Waiting for some spam (boy that's sounds
weird :^)

> 
> 
>>
>>
>>>    I move the message in the training folder, but do I
>>>    leave it there, or do I wait 10 minutes and then delete
>>>    it?  Once trained, can I move it back to the INBOX?
>>>    I like to have all e-mail that I've read, but haven't
>>>    responded to, left in my INBOX.
>>
>>
>>
>> You can move/delete the messages out of the folder (and this may help 
>> speed,
>> as below) once they have been trained.
>>
>> There are also two [imap] options: 'move_trained_ham_to_folder' and
>> 'move_trained_spam_to_folder' which may be of use (I can't recall 
>> offhand if
>> these are exposed via the user interface or not.  If not, then you'd 
>> need to
>> manually edit your configuration file).  These do pretty much what the 
>> names
>> suggest - after mail has been trained, it gets moved to the specified
>> folder.  For example, you could set 'move_trained_ham_to_folder' to your
>> Inbox.  It does help (in terms of speed) to keep the training folders
>> relatively small, because the filter needs to wade through them to figure
>> out if there's anything new there.
> 
> 
> Ah! I'll give that a try and see what happens. I was
> wondering how to tell when the thing is trained and
> how to get it to place messages in the right place.
> This would solve several problems, I think!
> 
> Here's what I'll try:
> 
> ham_train_folders:INBOX.Bayesian.MarkAsHam
> move_trained_ham_to_folder:INBOX
> spam_folder:INBOX.Bayesian.Spam
> spam_train_folders:INBOX.Bayesian.MarkAsSpam
> move_trained_spam_to_folder:INBOX.Bayesian.Spam
> unsure_folder:INBOX.Bayesian.Unsure
> filter_folders:INBOX
> 
> Is there a option for "ham_folder"? 

Let me know what you think about the above config
and if there is a "ham_folder" option.

 > I'm just wondering if
> I should have the Inbox filtered to a Ham, Spam, or Unsure
> folder, rather than having the Inbox filtered every 10
> minutes, since I tyically leave messages in the Inbox for
> a while (which I could in this case, leave them in the
> Ham folder). Is there a better way than what I have here?
> 
> 
>>
>>
>>> 4) I'm seeing ham messages in my INBOX that, when I move
>>>    them to other folders on the server, they often will
>>>    reappear in my INBOX (as unread messages). What is
>>>    going on?
>>
>>
>>
>> I don't know :)  I suppose it's possible that SpamBayes is in the 
>> middle of
>> processing that message when you move it, but that seems unlikely. 
> 
> 
> It seems, and I haven't verified it yet, that when the filter
> is running, I can only see the headers from my client. For
> example, clicking on the message wont show the body. Once
> the filter goes to sleep, I can access the message.

Verified. I cannot see the message bodies in the Inbox, when
the sb_imapfilter.py is running (once it sleeps it allows my
client to see the folder).

> 
> 
>> Does the
>> message that you move have the SpamBayes headers?  Does the new one?  It
>> might help if you run imapfilter with "-i4", capturing the output, 
>> during a
>> time when this happens.  This produces a log of the IMAP conversation 
>> and we
>> could use it to debug the problem.  If so, be sure to edit the file 
>> before
>> you send it here, because it includes the IMAP username and password in
>> clear text.
> 
> 
> I'm not sure. I'll try the filtering with -i4 and see what
> happens and I'll closely monitor new messages and report
> back what I see (I just haven't seen any for a while).

I still need to do the -i4 option.

> 
> I do know this... I can click on "get messages" button
> from my mail client (Netscape) and see, say five new messages
> in the Inbox (or Spam or Unsure). I can read each of them, so
> that their status is no longer "new" (unread).
> 
> Then, I can click on the "get messages" button aain, and all
> five will appear as "new", unread messages!  If I look at them
> again and click on "get messages" they stay unchanged.
> 
> Likewise, I can take an unread message, read it, move it to
> another folder, click "get messages" and the message reappears
> as a new, unread message in my Inbox!

Yes, I can verify that, when a new message arrives (marked as
unread), it does not have a spam classification. I can read it,
move it, whatever, but after sb_imapfilter.py runs, the message
re-appears in the Inbox, as unread, with a classification.

Would having the filter move the message from the inbox to a
spam, ham, or unsure folder resolve this issue, in that I would
just look at the ham folder for messages (and don't move/delete
them, until they're in that folder)? That coupled with a faster
filtering cycle?  If so, is there a ham_folder option?

>>>    I also had one case where a message was in
>>>    my INBOX, I read it and replied to the sender. I read
>>>    other messages in the INBOX and other folders (I have
>>>    the IMAP server filter some messages directly to
>>>    folders, so those are skipped by this process). I
>>>    may have possibly done an Empty trash. When I looked
>>>    back at the INBOX a few minutes later, the message
>>>    was gone!  Ouch!
>>
>>
>>
>> I presume it was filtered and classified as unsure/spam?
> 
> 
> I don't think it was, as it was in my Inbox and not in my
> Spam or Unsure folder.
> 
> 
>> It should
>> therefore have appeared in one of those folders (although the original is
>> still there marked to be deleted, unless you did do the empty trash).  
>> This
>> is one of the problems with the filtering approach - because the 
>> filter only
>> executes every x minutes, in times between execution there are 
>> messages in
>> your folders that wouldn't normally be there.  Reducing x might help, if
>> that doesn't put too much load on things.
> 
> 
> I can try a -l 1 later on and see what happens. I haven't
> seen a message lost since, but I do see this message
> appears as new twice thing, all the time.
> 
> 
>>
>>
>>> 5) I think I read on the list archive about messages are
>>>    marked for deletion, but I don't seem to see classified
>>>    messages that have been moved (copied?) to, say a SPAM
>>>    folder by Spambayes. Is there some configuration that
>>>    I need to set in my client to see these messages? Is
>>>    there something I should be doing differently, when
>>>    configuring/running Spambayes?
>>
>>
>>
>> Do you mean that messages classified as spam don't appear in your spam
>> folder?  This would be bad, and the -i4 output mentioned above would
>> probably be needed to identify the problem.  If that's not what you mean,
>> could you clarify?
> 
> 
> I guess what I was asking was that if a message was classified
> and then a new copy of the message was created, say marked as
> Spam, and the original was still there marked as deleted, should
> I be able to see the original message somewhere (trash)? I
> don't see the message anywhere.  I have my client configured to
> "when I delete a message, move it to the trash folder".  I'm
> just trying to see if there is a way to recover that message
> that was lost.
> 

Could use your advice on the above stuff...

> 
>>
>> The 'marked for deletion' comments are talking about the way the filter
>> moves/filters messages.  Basically, IMAP doesn't provide any method for
>> moving messages (a terrible flaw in the spec, if you ask me), so the 
>> filter
>> has to create a new copy of the message, and mark the old one for 
>> deletion.
>> So there'll always be a (to be deleted) copy of every message that 
>> SpamBayes
>> (successfully) processes.  This even applies to ham messages, which are
>> 'copied' to the same folder, so that the SpamBayes headers get added
>> (because IMAP also doesn't let you modify an existing message, other than
>> the flags).

Looking forward to your ideas on this.

PCM @ WORK (Paul Michali)