[Spambayes] Problems getting started with IMAP...
Paul Michali
pcm at cisco.com
Wed Apr 21 07:37:29 EDT 2004
I'm keeping open questions/issues and adding more
responses in-line...
Paul Michali wrote:
>>> 2) When I ran with the "-b" option to configure, I
>>> could setup the folders, but I cannot display any
>>> stats (I get several error messages displayed, which
>>> I can provide, if it helps).
>>
>>
>>
>> Yes, it would help to have the error messages.
>
>
> Here's what the browser shows, when I pick the "More
> Statistics..." link:
>
> 500 Server error
>
> Traceback (most recent call last):
>
> File "/users/pcm/lib/python/spambayes/Dibbler.py", line 461, in
> found_terminator
> getattr(plugin, name)(**params)
>
> File "/users/pcm/lib/python/spambayes/UserInterface.py", line 1016, in
> onStats
> s = Stats.Stats()
>
> File "/users/pcm/lib/python/spambayes/Stats.py", line 42, in __init__
> self.CalculateStats()
>
> File "/users/pcm/lib/python/spambayes/Stats.py", line 58, in
> CalculateStats
> for msg in msginfoDB.db.keys():
>
> AttributeError: 'NoneType' object has no attribute 'keys'
>
>
>> Good point :) With the POP3 proxy, the web interface is always running
>> while the proxies are running, so it's not a problem. With the IMAP
>> filter,
>> it's *either* the web interface *or* the filter. I vaguely recall that
>> there was a reason for this at the time, but I can't recall what it was.
>>
>> You could run "imapfilter.py -b" as a separate process and use that to
>> view
>> the web interface (although I'm not sure how happily the two would play
>> together in terms of accessing the databases). Maybe the web interface
>> should be served if the '-l' option is used? (I suspect that the above
>> reason that I can't remember might be something to do with the fact
>> that the
>> script is short-lived without -l, so there isn't any point in serving the
>> interface).
>
>
> The stats would be nice, and training on incoming messages,
> like the POP3 one would be awesome!
It "seems" like I can run a second instance of sb_imapfilter.py
with the -b option. I'm not sure if they interact.
So, one open item is that I can't seem to see stats when I run
with "-b" option. Error messages shown above.
>> I'd want to think about it for another day, but I don't see any reason at
>> the moment that this shouldn't be the case, so could check that change in
>> for 1.0b2.
>>
>>
>>> Is there a way to train, like what is done for the
>>> POP version?
>>
>>
>>
>> Yes. You can either use the 'upload' box on the web interface
>> (running with
>> -b) to upload mbox files (or Outlook Express files, if you have
>> those), or
>> you can use imapfilter itself. This is the purpose of the -t switch
>> (if you
>> just use -c it'll classify/filter, but not train, and vice-versa).
>
>
> Hmm. With IMAP though, when I look into my mail directory on
> my machine, I don't see the full messages. It almost looks
> like some kind of index or summary? Would I need to move the
> messages to a local folder and then "upload" them? I'll
> probably try to use the -t (but see below for some problems).
>
>
>>
>> Basically in the configuration, you should be able to select one or more
>> folders that contain ham, and one or more that contain spam. These
>> can be
>> ones that you filter, or not (you might have specific "ham_to_train" and
>> "spam_to_train" folders, for example, or you might just use your inbox
>> and
>> spam folder). The -t option gets imapfilter to run through these folders
>> and appropriately train any messages it hasn't seen before.
>>
>>
>>> 3) For filtering, I'm using just the INBOX. For training
>>> I have a HAM and SPAM folder and I have an UNSURE
>>> folder. What procedures do I follow, when I get e-mail
>>> that is tagged incorrectly? The cases I've seen so far
>>> are messages marked as unsure, which are either spam
>>> or ham, and messages marked as ham, which are spam.
>>
>>
>>
>> Any mistakes should be put in the correct training folder. The mail is
>> tagged with an id header that lets SpamBayes correct any mistraining,
>> too,
>> so you can move from the spam training folder to the ham training folder
>> quite happily.
>
>
> OK. So I got a message for prescription drugs (love those :^).
> It was marked as "unsure" and was placed into my Unsure
> folder. I moved it to my Spam folder (used for training).
> Now it just sits there and the classification never changes
> to "spam". I have to move it to my inbox and then it will
> get reclassified and moved to the spam folder.
I'll see if the "move_trained_{ham,spam}_to_folder" will
solve this issue. Waiting for some spam (boy that's sounds
weird :^)
>
>
>>
>>
>>> I move the message in the training folder, but do I
>>> leave it there, or do I wait 10 minutes and then delete
>>> it? Once trained, can I move it back to the INBOX?
>>> I like to have all e-mail that I've read, but haven't
>>> responded to, left in my INBOX.
>>
>>
>>
>> You can move/delete the messages out of the folder (and this may help
>> speed,
>> as below) once they have been trained.
>>
>> There are also two [imap] options: 'move_trained_ham_to_folder' and
>> 'move_trained_spam_to_folder' which may be of use (I can't recall
>> offhand if
>> these are exposed via the user interface or not. If not, then you'd
>> need to
>> manually edit your configuration file). These do pretty much what the
>> names
>> suggest - after mail has been trained, it gets moved to the specified
>> folder. For example, you could set 'move_trained_ham_to_folder' to your
>> Inbox. It does help (in terms of speed) to keep the training folders
>> relatively small, because the filter needs to wade through them to figure
>> out if there's anything new there.
>
>
> Ah! I'll give that a try and see what happens. I was
> wondering how to tell when the thing is trained and
> how to get it to place messages in the right place.
> This would solve several problems, I think!
>
> Here's what I'll try:
>
> ham_train_folders:INBOX.Bayesian.MarkAsHam
> move_trained_ham_to_folder:INBOX
> spam_folder:INBOX.Bayesian.Spam
> spam_train_folders:INBOX.Bayesian.MarkAsSpam
> move_trained_spam_to_folder:INBOX.Bayesian.Spam
> unsure_folder:INBOX.Bayesian.Unsure
> filter_folders:INBOX
>
> Is there a option for "ham_folder"?
Let me know what you think about the above config
and if there is a "ham_folder" option.
> I'm just wondering if
> I should have the Inbox filtered to a Ham, Spam, or Unsure
> folder, rather than having the Inbox filtered every 10
> minutes, since I tyically leave messages in the Inbox for
> a while (which I could in this case, leave them in the
> Ham folder). Is there a better way than what I have here?
>
>
>>
>>
>>> 4) I'm seeing ham messages in my INBOX that, when I move
>>> them to other folders on the server, they often will
>>> reappear in my INBOX (as unread messages). What is
>>> going on?
>>
>>
>>
>> I don't know :) I suppose it's possible that SpamBayes is in the
>> middle of
>> processing that message when you move it, but that seems unlikely.
>
>
> It seems, and I haven't verified it yet, that when the filter
> is running, I can only see the headers from my client. For
> example, clicking on the message wont show the body. Once
> the filter goes to sleep, I can access the message.
Verified. I cannot see the message bodies in the Inbox, when
the sb_imapfilter.py is running (once it sleeps it allows my
client to see the folder).
>
>
>> Does the
>> message that you move have the SpamBayes headers? Does the new one? It
>> might help if you run imapfilter with "-i4", capturing the output,
>> during a
>> time when this happens. This produces a log of the IMAP conversation
>> and we
>> could use it to debug the problem. If so, be sure to edit the file
>> before
>> you send it here, because it includes the IMAP username and password in
>> clear text.
>
>
> I'm not sure. I'll try the filtering with -i4 and see what
> happens and I'll closely monitor new messages and report
> back what I see (I just haven't seen any for a while).
I still need to do the -i4 option.
>
> I do know this... I can click on "get messages" button
> from my mail client (Netscape) and see, say five new messages
> in the Inbox (or Spam or Unsure). I can read each of them, so
> that their status is no longer "new" (unread).
>
> Then, I can click on the "get messages" button aain, and all
> five will appear as "new", unread messages! If I look at them
> again and click on "get messages" they stay unchanged.
>
> Likewise, I can take an unread message, read it, move it to
> another folder, click "get messages" and the message reappears
> as a new, unread message in my Inbox!
Yes, I can verify that, when a new message arrives (marked as
unread), it does not have a spam classification. I can read it,
move it, whatever, but after sb_imapfilter.py runs, the message
re-appears in the Inbox, as unread, with a classification.
Would having the filter move the message from the inbox to a
spam, ham, or unsure folder resolve this issue, in that I would
just look at the ham folder for messages (and don't move/delete
them, until they're in that folder)? That coupled with a faster
filtering cycle? If so, is there a ham_folder option?
>>> I also had one case where a message was in
>>> my INBOX, I read it and replied to the sender. I read
>>> other messages in the INBOX and other folders (I have
>>> the IMAP server filter some messages directly to
>>> folders, so those are skipped by this process). I
>>> may have possibly done an Empty trash. When I looked
>>> back at the INBOX a few minutes later, the message
>>> was gone! Ouch!
>>
>>
>>
>> I presume it was filtered and classified as unsure/spam?
>
>
> I don't think it was, as it was in my Inbox and not in my
> Spam or Unsure folder.
>
>
>> It should
>> therefore have appeared in one of those folders (although the original is
>> still there marked to be deleted, unless you did do the empty trash).
>> This
>> is one of the problems with the filtering approach - because the
>> filter only
>> executes every x minutes, in times between execution there are
>> messages in
>> your folders that wouldn't normally be there. Reducing x might help, if
>> that doesn't put too much load on things.
>
>
> I can try a -l 1 later on and see what happens. I haven't
> seen a message lost since, but I do see this message
> appears as new twice thing, all the time.
>
>
>>
>>
>>> 5) I think I read on the list archive about messages are
>>> marked for deletion, but I don't seem to see classified
>>> messages that have been moved (copied?) to, say a SPAM
>>> folder by Spambayes. Is there some configuration that
>>> I need to set in my client to see these messages? Is
>>> there something I should be doing differently, when
>>> configuring/running Spambayes?
>>
>>
>>
>> Do you mean that messages classified as spam don't appear in your spam
>> folder? This would be bad, and the -i4 output mentioned above would
>> probably be needed to identify the problem. If that's not what you mean,
>> could you clarify?
>
>
> I guess what I was asking was that if a message was classified
> and then a new copy of the message was created, say marked as
> Spam, and the original was still there marked as deleted, should
> I be able to see the original message somewhere (trash)? I
> don't see the message anywhere. I have my client configured to
> "when I delete a message, move it to the trash folder". I'm
> just trying to see if there is a way to recover that message
> that was lost.
>
Could use your advice on the above stuff...
>
>>
>> The 'marked for deletion' comments are talking about the way the filter
>> moves/filters messages. Basically, IMAP doesn't provide any method for
>> moving messages (a terrible flaw in the spec, if you ask me), so the
>> filter
>> has to create a new copy of the message, and mark the old one for
>> deletion.
>> So there'll always be a (to be deleted) copy of every message that
>> SpamBayes
>> (successfully) processes. This even applies to ham messages, which are
>> 'copied' to the same folder, so that the SpamBayes headers get added
>> (because IMAP also doesn't let you modify an existing message, other than
>> the flags).
Looking forward to your ideas on this.
PCM @ WORK (Paul Michali)
More information about the Spambayes
mailing list