[Spambayes] For the bold

Tim Peters tim.one@comcast.net
Fri, 04 Oct 2002 15:43:26 -0400


BTW, that teensy test run I reported on uncovered a ham hiding in BruceG's
spam -- it was one the "false negatives" the central-limit scheme said it
was unsure about, but *guessed* it was ham (note that both zscores are very
large):

"""
Data/Spam/Set5/6510.txt
prob = 0.49
prob('*zham*') = -31.6082
prob('*zspam*') = -44.4025
prob('header:Organization:1') = 0.00738916
prob('wrote:') = 0.0110024
prob('header:User-Agent:1') = 0.0167286
prob('class') = 0.0412844
prob('files') = 0.0412844
prob('comes') = 0.0505618
prob('hi,') = 0.0652174
prob('might') = 0.12963
prob('subject:: ') = 0.135891
prob('contains:') = 0.155172
prob('files.') = 0.155172
prob('there.') = 0.155172
prob('inc.') = 0.155172
prob('subject:?') = 0.194323
prob('charset:us-ascii') = 0.244597
prob('line') = 0.263314
prob('content-type:text/plain') = 0.306763
prob('proto:http') = 0.681245
prob('skip:p 10') = 0.691388
prob('will') = 0.700267
prob('url:org') = 0.701342
prob('url:www') = 0.702475
prob('easily') = 0.724719
prob('been') = 0.740964
prob('your') = 0.752572
prob('addresses') = 0.775229
prob('subject:-') = 0.775229
prob('people') = 0.776817
prob('url:html') = 0.776817
prob('world') = 0.810078
prob('subject:000') = 0.844828
prob('subject:. ') = 0.844828
prob('ease') = 0.844828
prob('sent') = 0.85503
prob('bulk') = 0.908163
prob('subject:,') = 0.908163
prob('emails') = 0.908163
prob('low') = 0.908163
prob('our') = 0.918944
prob('regardless') = 0.934783
prob('received.') = 0.934783
prob('info') = 0.958716
prob('million') = 0.965116
prob('send') = 0.969799
prob('unsubscribe') = 0.969799
prob('header:Return-Path:1') = 0.971807
prob('header:Received:7') = 0.973373
prob('money') = 0.983271
prob('email') = 0.991159
prob('please') = 0.991803

Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: lists-linux-kernel@bruce-guenter.dyndns.org
Received: (qmail 27880 invoked from network); 16 Apr 2002 17:23:30 -0000
Received: from vger.kernel.org (209.116.70.75)
  by bruce-guenter.dyndns.org (192.168.1.3) with ESMTP; 16 Apr 2002
17:23:30 -0000
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id <S313775AbSDPRUh>; Tue, 16 Apr 2002 13:20:37 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org
        id <S313776AbSDPRUg>; Tue, 16 Apr 2002 13:20:36 -0400
Received: from moutvdomng1.kundenserver.de ([212.227.126.181]:56795 "EHLO
        moutvdomng1.kundenserver.de") by vger.kernel.org with ESMTP
        id <S313775AbSDPRUf>; Tue, 16 Apr 2002 13:20:35 -0400
Received: from [212.227.126.155] (helo=mrvdomng2.kundenserver.de)
        by moutvdomng1.kundenserver.de with esmtp (Exim 3.22 #2)
        id 16xWd5-0001Gw-00
        for linux-kernel@vger.kernel.org; Tue, 16 Apr 2002 19:20:31 +0200
Received: from pd9e23b10.dip.t-dialin.net ([217.226.59.16]
helo=ngforever.de)
        by mrvdomng2.kundenserver.de with esmtp (Exim 3.22 #2)
        id 16xWd4-0007sA-00
        for linux-kernel@vger.kernel.org; Tue, 16 Apr 2002 19:20:31 +0200
Message-ID: <3CBC5D5D.7060909@ngforever.de>
Date:   Tue, 16 Apr 2002 11:20:29 -0600
From:   Thunder from the hill <thunder@ngforever.de>
Organization: The LuckyNet Administration
User-Agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:0.9.9+)
Gecko/20020405
X-Accept-Language: en-us, en
MIME-Version: 1.0
To:     LKML <linux-kernel@vger.kernel.org>
Subject: Re: 60 Million Emails inc. 600,000 Uk =?ISO-8859-1?Q?=A319=2E95?=
References: <20020416154606Z313666-22651+7853@vger.kernel.org>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Length: 948

Hi,

Bulk Email Cd wrote:
> Bulk Email CD just #19.95 inc. p&p and contains:
>
> 60 Million World wide email addresses.
> 600,000 VALIDATED UK email addresses - Verified in March 2002, ensu=
ring a
low failure rate.
>
> The World-wide emails have been split and compressed into many file=
s for
ease of use. The UK lists comes in easily identifiable files.
>
> The CD comes with simple instuctions and will be sent by first clas=
s post
as soon as your money has been received.
> [Snip]

People selling email addresses really make me sick. People degraded to
wares, regardless of their personalities. We might even find Alan in there.

Regards,
Thunder
--
Thunder from the hill.
Citizen of our universe.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/"""
"""

If I have to keep the quote of the Nigerian-scam spam in my ham, this one
has noooooo excuse for being called spam <wink>.

Here's another one it was unsure about:

"""
Return-Path: <jax@inet.pl>
Delivered-To: em-ca-bruceg@em.ca
Received: (qmail 15516 invoked from network); 8 Aug 2002 22:20:13 -0000
Received: from mail.inet.pl (195.116.59.85)
  by churchill.factcomp.com with SMTP; 8 Aug 2002 22:20:13 -0000
Received: (qmail 26458 invoked by uid 33); 8 Aug 2002 22:26:04 -0000
Date: 8 Aug 2002 22:26:04 -0000
Message-ID: <20020808222604.26455.qmail@mail.inet.pl>
TO: bruceg@em.ca
From: jax@inet.pl
Subject: Wiadomość została dostarczona
Content-Length: 129

Twoja Wiadomość została dostarczona !
 Zostanie jednak przeczytana 12 sierpnia.
 Do tego czasu korzystam z wypoczynku.
""

I have no idea -- do you?  I really despise the presumption that non-English
msgs are spam, BTW.