RE: [spambayes-dev] 1.0 Build Testing (please!)
[As an aside, given the positive response from Richie, Kenny and me, I've gone ahead and done the release. I'm around tomorrow & Thursday if there needs to be a very quick 1.0.1. BTW, SpamBayes' 2nd birthday quietly passed three weeks ago! Here's hoping that the next stable release doesn't take another 2 years <wink>] [VMWare description snipped] Thanks for that. If I had more use for it, then it sounds like it could be quite useful (or something like VirtualPC, as Kenny mentioned, although I'm not sure how well Microsoft is treating it since they acquired it). For the moment, I'll have to make do (unless I can convince work that something like that might be useful ;)
I've tested the new build and it all works fine for me, except for uploading an Outlook Express mailbox for bulk training - it only trained on one message out of the mailbox. Has anyone else seen this?
I've done very little with the OE mailbox stuff. I tested it here now and got an interesting result. With the first folder I chose it only trained one message, but when I opened OE to take a look at how many there should have been (I was pretty sure it should be more) I realised that I had chosen an IMAP folder (which it wouldn't have been able to connect to) so maybe an error occurred when trying to train. If I just used my regular inbox, then it trained on all 11 messages that were there. I wonder if maybe when something goes wrong it only trains one and then silently fails? Was the dbx you used odd in any way?
My other problem is probably not our fault (and probably the result of trying to make by brain work first thing in the morning) but I set up an Outlook Express filter for To: "spam," and it didn't fire, even though SpamBayes was correctly prepending "spam," to the To: header. Am I missing something? The filer was set to work on the Inbox, and to move messages to a "Possible Spam" folder, but they are staying in the Inbox.
This is something I had never noticed before (I very seldom actually use OE for anything but testing). It seems that although we add "unsure," (etc) to the To list, OE strips the comma off the end. If you look in the preview pane, it has "unsure; ta-meyer@ihug.co.nz" (etc), with no comma. If you change the rule to just look for "spam" it'll work (but I think it will also fire for spambayes@python.org, too). The comma definitely stays if you notate the subject, rather than the To: header. This is a documentation error, then, I guess. The next release will include the fix that lets you change the classification words and still use the notate options, so people can use "fibbledegook-spam" and avoid this. I'm not sure what else we can do. =Tony Meyer
Tony Meyer <tameyer@ihug.co.nz> wrote:
This is something I had never noticed before (I very seldom actually use OE for anything but testing). It seems that although we add "unsure," (etc) to the To list, OE strips the comma off the end. If you look in the preview pane, it has "unsure; ta-meyer@ihug.co.nz" (etc), with no comma. If you change the rule to just look for "spam" it'll work (but I think it will also fire for spambayes@python.org, too).
I don't use Outlook Express either so I'm just guessing, but from the looks of it Outlook Express is treating the comma as a separator between two e-mail addresses. This would be consistent with the address-list definition in RFC 2822. We may need to consider some changes to the format of the information we add. The spec seems to indicate that the { and } characters are legal in an e-mail address, but I've rarely if ever seen them used. Maybe something like "{spam}original@address" instead of the comma-separator? It would obviously require people to modify their filter rules, but it doesn't appear that rules for the current format would work correctly anyway. -- Kenny Pitt
From: Kenny Pitt Sent: Tuesday, September 28, 2004 9:24 AM
<...>
We may need to consider some changes to the format of the information we add. The spec seems to indicate that the { and } characters are legal in an e-mail address, but I've rarely if ever seen them used. Maybe something like "{spam}original@address" instead of the comma-separator? It would obviously require people to modify their filter rules, but it doesn't appear that rules for the current format would work correctly anyway.
I don't use OE for anything but newsreading, but it seems that the putting 'spam' or 'unsure' in the To: header, where one expects an address, is pretty odd. As far as legal address characters, there are different rules for the domain part and the local part, and I doubt that MS paid any attention to those, anyway. There are no rules for a string like 'Spam', which contains neither a FQDN nor an "@" and is therefore not a legitimate address. Isn't it possible to modify the Subject: header and write a rule to filter on that? The only limitations there are 7-bit US-ASCII with the exception of the control codes, <CR><LF> and 0x7F. Suitable strings for the Subject: header might be '{Spam} ' and '{Unsure} ', as the curly braces would rarely be seen in real mail subjects. -- Seth Goodman
Seth Goodman <sethg@goodmanassociates.com> wrote:
I don't use OE for anything but newsreading, but it seems that the putting 'spam' or 'unsure' in the To: header, where one expects an address, is pretty odd.
Well, the option is there and people are using it so I thought we should at least make it work correctly. <wink> I never cared for it, either, because it plays havoc with the ability to reply to the message if it was in fact legitimate, and I would have no reservations about simplifying the configuration and eliminating this option. On the other hand, users sometimes get unhappy when a feature that they are used to just disappears, and you don't have to turn it on if you don't need or want it.
Isn't it possible to modify the Subject: header and write a rule to filter on that? The only limitations there are 7-bit US-ASCII with the exception of the control codes, <CR><LF> and 0x7F. Suitable strings for the Subject: header might be '{Spam} ' and '{Unsure} ', as the curly braces would rarely be seen in real mail subjects.
There is a notate_subject option that modifies the Subject header instead of the To header, so users have a choice. Currently, the subject is also modified by prepending the classification and a comma ("spam," or "unsure,"). Bracketing the classification in [] or {} is probably the more common approach in other filters, but the end result is the same. -- Kenny Pitt
participants (3)
-
Kenny Pitt -
Seth Goodman -
Tony Meyer