[Spambayes] Re: big imapfilter.py problem

David Abrahams dave at boost-consulting.com
Tue Apr 29 23:28:16 EDT 2003


"Meyer, Tony" <T.A.Meyer at massey.ac.nz> writes:

>> But no messages got classified spam or unsure, AFAICT.
>> Even after I move some of the spam training messages into my 
>> inbox, they're not classified as spam.
>
> I think I have fixed this now.  I've changed imapfilter so that instead
> of iterating through the entire RFC822 message when we go through a
> folder, we just retrieve the headers (which we can use to determine if
> the message has been trained/classified or not).  If we then have to do
> something to the message, the substance is retrieved.
>
> This should speed things up (no more retrieving the substance many
> times), and I think I fixed the bug that stopped messages being filtered
> along the way.
>
> If you could check it (again!) that would be great.  If it's still not
> working, could you run it with "-i4" and see whether it's doing any
> FETCH RFC822[.PEEK] commands?  If not, then there's something up with
> the db checking, if so then it's something else.

It's something else.  With -i5:

...
\nC++-sig mailing list\r\nC++-sig at python.org\r\nhttp://mail.python.org/mailman/listinfo/c++-sig\r\n')"]
  26:47.80 <  UID 303)
  26:47.81 untagged_responses[FETCH] 1 += [" UID 303)"]
  26:47.81 < BBHC5 OK completed
  26:47.81      matched r'(?P<tag>BBHC\d+) (?P<type>[A-Z]+) (?P<data>.*)' => ('BBHC5', 'OK', 'completed')
  26:47.81 untagged_responses[FETCH] => [('267 (RFC822 {3537}', 'Return-Path: <c++-sig-admin at python.org>\r\nReceived: fr
om mx04.mrf.mail.rcn.net ([207.172.4.53] verified)\r\n\tby stlport.com (CommuniGate Pro SMTP 3.5.9)\r\n\twith ESMTP id 2
04725 for dave at boost-consulting.com;\r\n\tSat, 01 Mar 2003 12:33:06 -0800\r\nReceived: from mail.python.org ([12.155.117
.29])\r\n\tby mx04.mrf.mail.rcn.net with esmtp (Exim 3.35 #4)\r\n\tid 18pDfM-0001pg-00\r\n\tfor david.abrahams at rcn.com;
Sat, 01 Mar 2003 15:33:04 -0500\r\nReceived: from localhost.localdomain ([127.0.0.1] helo=mail.python.org)\r\n\tby mail.
python.org with esmtp (Exim 4.05)\r\n\tid 18pDfL-00048J-00; Sat, 01 Mar 2003 15:33:03 -0500\r\nReceived: from srv.global
ite.com.br ([200.180.16.1])\r\n\tby mail.python.org with esmtp (Exim 4.05)\r\n\tid 18pDem-00047V-00\r\n\tfor c++-sig at pyt
hon.org; Sat, 01 Mar 2003 15:32:28 -0500\r\nReceived: from globalite.com.br (ixstoj at bridge.int-02.globalite.com.br\r\n\t
[200.180.16.8])\r\n\tby srv.globalite.com.br (8.12.7/8.12.6) with ESMTP id h21HUKU7071413\r\n\tfor <c++-sig at python.org>;
 Sat, 1 Mar 2003 17:30:21 GMT\r\nMessage-ID: <3E6118D9.8030103 at globalite.com.br>\r\nFrom: Nicodemus <nicodemus at globalite
.com.br>\r\nUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;\r\n\trv:1.2.1) Gecko/20021130\r\nX-Accept-Langua
ge: en-us, en\r\nMIME-Version: 1.0\r\nTo: c++-sig at python.org\r\nSubject: Re: [C++-sig] Support for member operators?\r\n
References: <20030301201353.83610.qmail at web20205.mail.yahoo.com>\r\nIn-Reply-To: <20030301201353.83610.qmail at web20205.ma
il.yahoo.com>\r\nContent-Type: text/plain; charset=us-ascii; format=flowed\r\nContent-Transfer-Encoding: 7bit\r\nX-Spam-
Status: No, hits=-4.9 required=5.0\r\n\ttests=BODY_PYTHON_ZOPE,EMAIL_ATTRIBUTION,FROM_BR_PT_AR,IN_REP_TO,RCVD_SPAMLAND_2
,REFERENCES,SPAM_PHRASE_00_01,USER_AGENT,USER_AGENT_MOZILLA_UA,X_ACCEPT_LANG\r\nX-Spam-Level: \r\nSender: c++-sig-admin@
python.org\r\nErrors-To: c++-sig-admin at python.org\r\nX-BeenThere: c++-sig at python.org\r\nX-Mailman-Version: 2.0.13 (10127
0)\r\nPrecedence: bulk\r\nReply-To: c++-sig at python.org\r\nList-Help: <mailto:c++-sig-request at python.org?subject=help>\r\
nList-Post: <mailto:c++-sig at python.org>\r\nList-Subscribe: <http://mail.python.org/mailman/listinfo/c++-sig>,\r\n\t<mail
to:c++-sig-request at python.org?subject=subscribe>\r\nList-Id: Development of Python/C++ integration <c++-sig.python.org>\
r\nList-Unsubscribe: <http://mail.python.org/mailman/listinfo/c++-sig>,\r\n\t<mailto:c++-sig-request at python.org?subject=
unsubscribe>\r\nList-Archive: <http://mail.python.org/pipermail/c++-sig/>\r\nDate: Sat, 01 Mar 2003 17:32:25 -0300\r\nX-
Spambayes-Classification: unsure\r\nThanks for the reply Ralf,\r\n\r\nRalf W. Grosse-Kunstleve wrote:\r\n\r\n>--- Nicode
mus <nicodemus at globalite.com.br> wrote:\r\n>  \r\n>\r\n>>     const C operator+(int o)\r\n>>...\r\n>>\r\n>>If I try to e
xpose the operator+ like this:\r\n>>\r\n>>     .def( self + other<int>() )\r\n>>\r\n>>I get a compiler error: "no operat
or "+" matches these operands"\r\n>>    \r\n>>\r\n>\r\n>What happens if you change the member function signature to\r\n>
\r\n>C operator+(int o) const\r\n>  \r\n>\r\n\r\nIt works then. 8)\r\nBut what if a class defines a operator + that is n
ot const, ie., it \r\nchanges an attribute of the class? Can Boost.Python export this, or a \r\nparticular signature is
required to expose operators?\r\n\r\n>>Does Boost.Python support member operators?\r\n>>    \r\n>>\r\n>\r\n>I am pretty
sure it does, but your placement of "const" seems very unusual.\r\n>  \r\n>\r\n\r\nIt means to return a "const C" object
, not that the operator+ is const.\r\n\r\nNicodemus.\r\n\r\n\r\n\r\n_______________________________________________\r\nC
++-sig mailing list\r\nC++-sig at python.org\r\nhttp://mail.python.org/mailman/listinfo/c++-sig\r\n'), ' UID 303)']
Traceback (most recent call last):
  File "imapfilter.py", line 697, in ?
    run()
  File "imapfilter.py", line 683, in run
    imap_filter.Train()
  File "imapfilter.py", line 524, in Train
    num_ham_trained = folder.Train(self.classifier, False)
  File "imapfilter.py", line 464, in Train
    for msg in self:
  File "imapfilter.py", line 394, in __iter__
    yield self[key]
  File "imapfilter.py", line 441, in __getitem__
    msg.get_substance()
  File "imapfilter.py", line 291, in get_substance
    new_msg = email.Parser.Parser().parsestr(data["RFC822"])
  File "/usr/local/lib/python2.2/email/Parser.py", line 75, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/local/lib/python2.2/email/Parser.py", line 62, in parse
    self._parseheaders(root, fp)
  File "/usr/local/lib/python2.2/email/Parser.py", line 128, in _parseheaders
    raise Errors.HeaderParseError(
email.Errors.HeaderParseError: Not a header, not a continuation: ``Thanks for the reply Ralf,''

> BTW, I am working on the headers-with-no-line-endings thing (I grabbed
> the el source that you suggested), but it's taking a while...

Good luck!

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com




More information about the Spambayes mailing list