[spambayes-dev] Re: 1070 spam, 1 false positive

Greg Ward greg at python.org
Fri Jun 20 23:06:30 EDT 2003


On 20 June 2003, Martijn Pieters said:
> Sorry to be a party pooper, but there were more false positives; I rescued
> 2 earlier this week. The following message was also marked as spam:

Darn.  But there's good news:

> From: "Tom Deprez" <tom at aragne.com>
> To: <europython at python.org>,
>         <europython-announce at python.org>,
>         <zope-announce at zope.org>,
>         <python-announce at python.org>,
>         <eurozope at comlounge.net>
> Subject: EuroPython news
> Date: Mon, 16 Jun 2003 14:43:45 +0200

This one was rejected fairly early in the Spambayes regime.  I just
scored it with the current training set, and it scored < 0.1.

Also, for some reason the envelope recipient of that message was *just*
zope-announce at zope.org, in spite of what the "To" header says.  I bet if
that message had really been sent to europython at python.org, it would
have been flagged UNSURE.  No way to tell now, though, since I don't
have the training DB from Monday.

> From: "Morten W. Petersen" <morten at nidelven-it.no>
> To: zope-dev at zope.org
> Subject: Renaming a product
> X-Mailer: NeoMail 1.25
> X-IPAddress: 80.202.17.36
> MIME-Version: 1.0
> Content-Type: text/plain; charset=iso-8859-1
> Message-Id: <E19SKzG-0002Fj-00 at dns.activemedia.no>
> Date: Tue, 17 Jun 2003 20:15:18 +0200
> X-Virus-Scanned: by AMaViS 0.3.12
> X-AntiAbuse: This header was added to track abuse, please include it with
> any
> +abuse report
> X-AntiAbuse: Primary Hostname - dns.activemedia.no
> X-AntiAbuse: Original Domain - zope.org
> X-AntiAbuse: Originator/Caller UID/GID - [32940 1441] / [32940 1441]
> X-AntiAbuse: Sender Address Domain - dns.activemedia.no
> X-Spam-Status: SPAM (lists-zope 0.854)

And this one was treated very badly because of the X-AntiAbuse headers;
here's how it scores with the current DB:

Y 0.869 save/ham/cur/19SKzr-0000NR-00:2,S
        '*H*': 0.060
        '*S*': 0.797
        'all,': 0.065
        'message-id:skip:d 10': 0.065
        'zodb': 0.065
        'does': 0.086
        'product,': 0.092
        'thanks,': 0.092
        '(with': 0.155
        'instances': 0.155
        'python,': 0.155
        'return-path:skip:d 10': 0.155
        'date:0200': 0.173
        'date:Tue': 0.191
        'anyone': 0.230
        'content-type:text/plain': 0.266
        'received:62': 0.303
        'product': 0.379
        'know': 0.380
        'header:Received:3': 0.388
        'to:no real name:2**0': 0.610
        'date:Jun': 0.627
        'after': 0.635
        'charset:iso-8859-1': 0.641
        'work': 0.645
        'proto:http': 0.656
        'stored': 0.666
        'new': 0.715
        'to:addr:zope-dev': 0.789
        'to:dev': 0.789
        'url:www': 0.800
        'number:': 0.811
        'x-antiabuse:Address': 0.811
        'x-antiabuse:Caller': 0.811
        'x-antiabuse:Domain': 0.811
        'x-antiabuse:GID': 0.811
        'x-antiabuse:Hostname': 0.811
        'x-antiabuse:Original': 0.811
        'x-antiabuse:Originator': 0.811
        'x-antiabuse:Primary': 0.811
        'x-antiabuse:Sender': 0.811
        'x-antiabuse:This': 0.811
        'x-antiabuse:UID': 0.811
        'x-antiabuse:abuse': 0.811
        'x-antiabuse:added': 0.811
        'x-antiabuse:any': 0.811
        'x-antiabuse:header': 0.811
        'x-antiabuse:include': 0.811
        'x-antiabuse:please': 0.811
        'x-antiabuse:report': 0.811
        'x-antiabuse:track': 0.811
        'x-antiabuse:was': 0.811
        'x-antiabuse:with': 0.811
        'x-antiabuse:zope.org': 0.811
        'phone': 0.971

But if I add x-antiabuse to basic_header_skip, it comes through fine:

N 0.085 save/ham/cur/19SKzr-0000NR-00:2,S
        '*H*': 0.877
        '*S*': 0.047
        'all,': 0.065
        'message-id:skip:d 10': 0.065
        'zodb': 0.065
        'does': 0.086
        'product,': 0.092
        'thanks,': 0.092
        '(with': 0.155
        'instances': 0.155
        'python,': 0.155
        'return-path:skip:d 10': 0.155
        'date:0200': 0.173
        'date:Tue': 0.191
        'anyone': 0.230
        'content-type:text/plain': 0.266
        'received:62': 0.303
        'product': 0.379
        'know': 0.380
        'header:Received:3': 0.388
        'to:no real name:2**0': 0.610
        'date:Jun': 0.627
        'after': 0.635
        'charset:iso-8859-1': 0.641
        'work': 0.645
        'proto:http': 0.656
        'stored': 0.666
        'new': 0.715
        'to:addr:zope-dev': 0.789
        'to:dev': 0.789
        'url:www': 0.800
        'number:': 0.811
        'phone': 0.971

I'm building training DBs with x-antiabuse excluded now, to see how it
helps/hurts.  Another lively Friday night chez Greg...

        Greg
-- 
Greg Ward <gward at python.net>                         http://www.gerg.ca/
Never put off till tomorrow what you can put off till the day after tomorrow.



More information about the spambayes-dev mailing list