RE: [Spambayes] big imapfilter.py problem
But no messages got classified spam or unsure, AFAICT. Even after I move some of the spam training messages into my inbox, they're not classified as spam.
I think I have fixed this now. I've changed imapfilter so that instead of iterating through the entire RFC822 message when we go through a folder, we just retrieve the headers (which we can use to determine if the message has been trained/classified or not). If we then have to do something to the message, the substance is retrieved. This should speed things up (no more retrieving the substance many times), and I think I fixed the bug that stopped messages being filtered along the way. If you could check it (again!) that would be great. If it's still not working, could you run it with "-i4" and see whether it's doing any FETCH RFC822[.PEEK] commands? If not, then there's something up with the db checking, if so then it's something else. BTW, I am working on the headers-with-no-line-endings thing (I grabbed the el source that you suggested), but it's taking a while... =Tony Meyer
Meyer, Tony wrote:
If you could check it (again!) that would be great. If it's still not working, could you run it with "-i4" and see whether it's doing any FETCH RFC822[.PEEK] commands? If not, then there's something up with the db checking, if so then it's something else.
Got the new version, and it seems good so far. The training db contains the correct numbers, at least. Don't know if the filtering is working, because I haven't got any spam in my inbox . I left the previous version of imapfilter running on my work PC when I left last night, filtering every 15 minutes. There was no spam in my inbox when I checked from home, and imapfilter was still running this morning :-) Olly
"Meyer, Tony" <T.A.Meyer@massey.ac.nz> writes:
But no messages got classified spam or unsure, AFAICT. Even after I move some of the spam training messages into my inbox, they're not classified as spam.
I think I have fixed this now. I've changed imapfilter so that instead of iterating through the entire RFC822 message when we go through a folder, we just retrieve the headers (which we can use to determine if the message has been trained/classified or not). If we then have to do something to the message, the substance is retrieved.
This should speed things up (no more retrieving the substance many times), and I think I fixed the bug that stopped messages being filtered along the way.
If you could check it (again!) that would be great. If it's still not working, could you run it with "-i4" and see whether it's doing any FETCH RFC822[.PEEK] commands? If not, then there's something up with the db checking, if so then it's something else.
It's something else. With -i5: ... \nC++-sig mailing list\r\nC++-sig@python.org\r\nhttp://mail.python.org/mailman/listinfo/c++-sig\r\n')"] 26:47.80 < UID 303) 26:47.81 untagged_responses[FETCH] 1 += [" UID 303)"] 26:47.81 < BBHC5 OK completed 26:47.81 matched r'(?P<tag>BBHC\d+) (?P<type>[A-Z]+) (?P<data>.*)' => ('BBHC5', 'OK', 'completed') 26:47.81 untagged_responses[FETCH] => [('267 (RFC822 {3537}', 'Return-Path: <c++-sig-admin@python.org>\r\nReceived: fr om mx04.mrf.mail.rcn.net ([207.172.4.53] verified)\r\n\tby stlport.com (CommuniGate Pro SMTP 3.5.9)\r\n\twith ESMTP id 2 04725 for dave@boost-consulting.com;\r\n\tSat, 01 Mar 2003 12:33:06 -0800\r\nReceived: from mail.python.org ([12.155.117 .29])\r\n\tby mx04.mrf.mail.rcn.net with esmtp (Exim 3.35 #4)\r\n\tid 18pDfM-0001pg-00\r\n\tfor david.abrahams@rcn.com; Sat, 01 Mar 2003 15:33:04 -0500\r\nReceived: from localhost.localdomain ([127.0.0.1] helo=mail.python.org)\r\n\tby mail. python.org with esmtp (Exim 4.05)\r\n\tid 18pDfL-00048J-00; Sat, 01 Mar 2003 15:33:03 -0500\r\nReceived: from srv.global ite.com.br ([200.180.16.1])\r\n\tby mail.python.org with esmtp (Exim 4.05)\r\n\tid 18pDem-00047V-00\r\n\tfor c++-sig@pyt hon.org; Sat, 01 Mar 2003 15:32:28 -0500\r\nReceived: from globalite.com.br (ixstoj@bridge.int-02.globalite.com.br\r\n\t [200.180.16.8])\r\n\tby srv.globalite.com.br (8.12.7/8.12.6) with ESMTP id h21HUKU7071413\r\n\tfor <c++-sig@python.org>; Sat, 1 Mar 2003 17:30:21 GMT\r\nMessage-ID: <3E6118D9.8030103@globalite.com.br>\r\nFrom: Nicodemus <nicodemus@globalite .com.br>\r\nUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;\r\n\trv:1.2.1) Gecko/20021130\r\nX-Accept-Langua ge: en-us, en\r\nMIME-Version: 1.0\r\nTo: c++-sig@python.org\r\nSubject: Re: [C++-sig] Support for member operators?\r\n References: <20030301201353.83610.qmail@web20205.mail.yahoo.com>\r\nIn-Reply-To: <20030301201353.83610.qmail@web20205.ma il.yahoo.com>\r\nContent-Type: text/plain; charset=us-ascii; format=flowed\r\nContent-Transfer-Encoding: 7bit\r\nX-Spam- Status: No, hits=-4.9 required=5.0\r\n\ttests=BODY_PYTHON_ZOPE,EMAIL_ATTRIBUTION,FROM_BR_PT_AR,IN_REP_TO,RCVD_SPAMLAND_2 ,REFERENCES,SPAM_PHRASE_00_01,USER_AGENT,USER_AGENT_MOZILLA_UA,X_ACCEPT_LANG\r\nX-Spam-Level: \r\nSender: c++-sig-admin@ python.org\r\nErrors-To: c++-sig-admin@python.org\r\nX-BeenThere: c++-sig@python.org\r\nX-Mailman-Version: 2.0.13 (10127 0)\r\nPrecedence: bulk\r\nReply-To: c++-sig@python.org\r\nList-Help: <mailto:c++-sig-request@python.org?subject=help>\r\ nList-Post: <mailto:c++-sig@python.org>\r\nList-Subscribe: <http://mail.python.org/mailman/listinfo/c++-sig>,\r\n\t<mail to:c++-sig-request@python.org?subject=subscribe>\r\nList-Id: Development of Python/C++ integration <c++-sig.python.org>\ r\nList-Unsubscribe: <http://mail.python.org/mailman/listinfo/c++-sig>,\r\n\t<mailto:c++-sig-request@python.org?subject= unsubscribe>\r\nList-Archive: <http://mail.python.org/pipermail/c++-sig/>\r\nDate: Sat, 01 Mar 2003 17:32:25 -0300\r\nX- Spambayes-Classification: unsure\r\nThanks for the reply Ralf,\r\n\r\nRalf W. Grosse-Kunstleve wrote:\r\n\r\n>--- Nicode mus <nicodemus@globalite.com.br> wrote:\r\n> \r\n>\r\n>> const C operator+(int o)\r\n>>...\r\n>>\r\n>>If I try to e xpose the operator+ like this:\r\n>>\r\n>> .def( self + other<int>() )\r\n>>\r\n>>I get a compiler error: "no operat or "+" matches these operands"\r\n>> \r\n>>\r\n>\r\n>What happens if you change the member function signature to\r\n> \r\n>C operator+(int o) const\r\n> \r\n>\r\n\r\nIt works then. 8)\r\nBut what if a class defines a operator + that is n ot const, ie., it \r\nchanges an attribute of the class? Can Boost.Python export this, or a \r\nparticular signature is required to expose operators?\r\n\r\n>>Does Boost.Python support member operators?\r\n>> \r\n>>\r\n>\r\n>I am pretty sure it does, but your placement of "const" seems very unusual.\r\n> \r\n>\r\n\r\nIt means to return a "const C" object , not that the operator+ is const.\r\n\r\nNicodemus.\r\n\r\n\r\n\r\n_______________________________________________\r\nC ++-sig mailing list\r\nC++-sig@python.org\r\nhttp://mail.python.org/mailman/listinfo/c++-sig\r\n'), ' UID 303)'] Traceback (most recent call last): File "imapfilter.py", line 697, in ? run() File "imapfilter.py", line 683, in run imap_filter.Train() File "imapfilter.py", line 524, in Train num_ham_trained = folder.Train(self.classifier, False) File "imapfilter.py", line 464, in Train for msg in self: File "imapfilter.py", line 394, in __iter__ yield self[key] File "imapfilter.py", line 441, in __getitem__ msg.get_substance() File "imapfilter.py", line 291, in get_substance new_msg = email.Parser.Parser().parsestr(data["RFC822"]) File "/usr/local/lib/python2.2/email/Parser.py", line 75, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "/usr/local/lib/python2.2/email/Parser.py", line 62, in parse self._parseheaders(root, fp) File "/usr/local/lib/python2.2/email/Parser.py", line 128, in _parseheaders raise Errors.HeaderParseError( email.Errors.HeaderParseError: Not a header, not a continuation: ``Thanks for the reply Ralf,''
BTW, I am working on the headers-with-no-line-endings thing (I grabbed the el source that you suggested), but it's taking a while...
Good luck! -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams <dave@boost-consulting.com> writes:
"Meyer, Tony" <T.A.Meyer@massey.ac.nz> writes:
But no messages got classified spam or unsure, AFAICT. Even after I move some of the spam training messages into my inbox, they're not classified as spam.
I think I have fixed this now. I've changed imapfilter so that instead of iterating through the entire RFC822 message when we go through a folder, we just retrieve the headers (which we can use to determine if the message has been trained/classified or not). If we then have to do something to the message, the substance is retrieved.
This should speed things up (no more retrieving the substance many times), and I think I fixed the bug that stopped messages being filtered along the way.
If you could check it (again!) that would be great. If it's still not working, could you run it with "-i4" and see whether it's doing any FETCH RFC822[.PEEK] commands? If not, then there's something up with the db checking, if so then it's something else.
It's something else. With -i5:
... \nC++-sig mailing list\r\nC++-sig@python.org\r\nhttp://mail.python.org/mailman/listinfo/c++-sig\r\n')"] 26:47.80 < UID 303) 26:47.81 untagged_responses[FETCH] 1 += [" UID 303)"]
The problem appears to be that imapfilter.py added an X-Spambayes-Classification: header to the message, but failed to add a newline afterwards, which is required to separate it from the message body. ...err, but I forgot to set PYTHONPATH to use email-2.5. Training works when I do that. -- Dave Abrahams Boost Consulting www.boost-consulting.com
David Abrahams <dave@boost-consulting.com> writes:
"Meyer, Tony" <T.A.Meyer@massey.ac.nz> writes:
But no messages got classified spam or unsure, AFAICT. Even after I move some of the spam training messages into my inbox, they're not classified as spam.
I think I have fixed this now. I've changed imapfilter so that instead of iterating through the entire RFC822 message when we go through a folder, we just retrieve the headers (which we can use to determine if the message has been trained/classified or not). If we then have to do something to the message, the substance is retrieved.
This should speed things up (no more retrieving the substance many times), and I think I fixed the bug that stopped messages being filtered along the way.
If you could check it (again!) that would be great. If it's still not working, could you run it with "-i4" and see whether it's doing any FETCH RFC822[.PEEK] commands? If not, then there's something up with the db checking, if so then it's something else.
It's something else. With -i5:
... \nC++-sig mailing list\r\nC++-sig@python.org\r\nhttp://mail.python.org/mailman/listinfo/c++-sig\r\n')"] 26:47.80 < UID 303) 26:47.81 untagged_responses[FETCH] 1 += [" UID 303)"]
The problem appears to be that imapfilter.py added an X-Spambayes-Classification: header to the message, but failed to add a newline afterwards, which is required to separate it from the message body. ...err, but I forgot to set PYTHONPATH to use email-2.5. Training works when I do that. -- Dave Abrahams Boost Consulting www.boost-consulting.com
participants (3)
-
David Abrahams -
Meyer, Tony -
Oliver Maunder