[Spambayes] filter misclassification

K. H. Gowranga gowranga at serc.iisc.ernet.in
Mon May 2 13:58:19 CEST 2005


Hello,

Kindly ignore my immediate previous mail. The clues were obtained using

/usr/local/bin/showclues.py -d .hammiedb specimen_file > out_filename

But before doing this I retrained my database .hammiedb using

sb_mboxtrain.py -d .hammiedb -g mail/ham
                      -s mail/spam > /dev/null 2>&1


to get the revised contents of out_filename as:

**************************************************************************
Combined Score: 6% (0.0565183)
**************************************************************************
Internal ham score (*H*): 0.997835
Internal spam score (*S*): 0.110872

# ham trained on: 771
# spam trained on: 56

**************************************************************************
36 Significant Tokens
**************************************************************************
token                               spamprob         #ham  #spam

'science'                           0.003386           66      0
'work.'                             0.00819672         27      0
'sending'                           0.0100223          22      0
'url:htm'                           0.012894           17      0
'here:'                             0.0412844           5      0
'unsubscribe'                       0.0505618           4      0
'28,'                               0.0652174           3      0
'story'                             0.0652174           3      0
'as:'                               0.0918367           2      0
'nasa'                              0.0918367           2      0
'skip:l 40'                         0.0918367           2      0
'hunt'                              0.155172            1      0
'service.'                          0.155172            1      0
'url:l'                             0.155172            1      0
'url:t'                             0.155172            1      0
'email'                             0.168474           69      1
'2005'                              0.267918          151      4
'april'                             0.324552           29      1
'charset:iso-8859-1'                0.398251          125      6
'sender:none'                       0.624637          339     41
'header:Received:2'                 0.631978          104     13
'full'                              0.652969           29      4
'available'                         0.661128           35      5
'people'                            0.676645           26      4
'header:Message-Id:1'               0.699878          100     17
'send'                              0.734715           94     19
'subscribed'                        0.752178            4      1
'content-type:multipart/alternative' 0.773119           56     14
'content-type:text/html'            0.798317           55     16
'news'                              0.809481            9      3
'currently'                         0.844613           27     11
'email addr:serc.iisc.ernet.in.'    0.844828            0      1
'url:n'                             0.844828            0      1
'url:id'                            0.847125            4      2
'blank'                             0.884802           15      9
'plans'                             0.932517            2      4

**************************************************************************
Message Stream
**************************************************************************

Return-Path: <bounce-14810-111417 at lyris.msfc.nasa.gov>
X-Original-To: gowranga at serc.iisc.ernet.in
Delivered-To: gowranga at serc.iisc.ernet.in
Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3])
	by serc.iisc.ernet.in (Postfix) with ESMTP id 26B7F180F
	for <gowranga at serc.iisc.ernet.in>; Fri, 29 Apr 2005 09:54:31 +0530 (IST)
Received: from lyris.msfc.nasa.gov (www.spaceweather2.com [72.3.135.213])
	by iisc.ernet.in (8.12.9/8.12.8) with SMTP id j3TA0EBM048312
	for <gowranga at serc.iisc.ernet.in>; Fri, 29 Apr 2005 10:00:24 GMT
From: NASA Science News <snglist at lyris.msfc.nasa.gov>
To: NASA Science News <snglist at lyris.msfc.nasa.gov>
Subject: Prospecting for Lunar Water
Date: Thu, 28 Apr 2005 21:33:17 -0500
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="MIMEBoundarydf0496b01443e4ff2e2a4222f3c6cd6f"
List-Unsubscribe: <mailto:leave-snglist-111417C at lyris.msfc.nasa.gov>
Message-Id: <LYRIS-111417-14810-2005.04.28-21.33.19--gowranga#serc.iisc.ernet.in at lyris.msfc.nasa.gov>
X-Folder: Bulk
Status:
X-Status:
X-Keywords:

This is a multi-part message in MIME format.

--MIMEBoundarydf0496b01443e4ff2e2a4222f3c6cd6f
Content-Type: text/plain;
	charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit

NASA Science News for April 28, 2005

Settling alien worlds is thirsty work. Before sending people back to the Moon, NASA plans to send a robotic spacecraft first to hunt for water.

FULL STORY at

http://science.nasa.gov/headlines/y2005/28apr_lro.htm?list111417

The Science at NASA Podcast feed is available at http://science.nasa.gov/podcast.xml.


You are currently subscribed to snglist as: gowranga at serc.iisc.ernet.in.

This is a free service.

To unsubscribe click here: http://lyris.msfc.nasa.gov/u?id=111417C&n=T&l=snglist
or send a blank email to leave-snglist-111417C at lyris.msfc.nasa.gov

--MIMEBoundarydf0496b01443e4ff2e2a4222f3c6cd6f
Content-Type: text/html;
	charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit

<HTML><BODY>
NASA Science News for April 28, 2005<p>
Settling alien worlds is thirsty work. Before sending people back to the Moon, NASA plans to send a robotic spacecraft first to hunt for water. <p>
FULL STORY at<p>
<a href="http://science.nasa.gov/headlines/y2005/28apr_lro.htm?list111417">http://science.nasa.gov/headlines/y2005/28apr_lro.htm?list111417</a><p>
The Science at NASA Podcast feed is available at <a href="http://science.nasa.gov/podcast.xml.">http://science.nasa.gov/podcast.xml.</a> <p>
<br>
You are currently subscribed to snglist as: <a href="mailto:gowranga at serc.iisc.ernet.in">gowranga at serc.iisc.ernet.in</a>. <p>
This is a free service.<p>
To unsubscribe click here: http://lyris.msfc.nasa.gov/u?id=111417C&n=T&l=snglist<br>
or send a blank email to <a href="mailto:leave-snglist-111417C at lyris.msfc.nasa.gov">leave-snglist-111417C at lyris.msfc.nasa.gov</a>
</body></HTML>

--MIMEBoundarydf0496b01443e4ff2e2a4222f3c6cd6f--


**************************************************************************
All Message Tokens
**************************************************************************
97 unique tokens'2005'
'28,'
'alien'
'april'
'are'
'as:'
'available'
'back'
'before'
'blank'
'cc:none'
'charset:iso-8859-1'
'click'
'content-type:multipart/alternative'
'content-type:text/html'
'content-type:text/plain'
'currently'
'email'
'email addr:serc.iisc.ernet.in.'
'email name:gowranga'
'feed'
'first'
'for'
'free'
'from:addr:lyris.msfc.nasa.gov'
'from:addr:snglist'
'from:name:nasa science news'
'full'
'header:Date:1'
'header:From:1'
'header:MIME-Version:1'
'header:Message-Id:1'
'header:Received:2'
'header:Return-Path:1'
'header:Subject:1'
'header:To:1'
'here:'
'hunt'
'message-id:@lyris.msfc.nasa.gov'
'moon,'
'nasa'
'news'
'people'
'plans'
'podcast'
'proto:http'
'reply-to:none'
'robotic'
'science'
'science at nasa'
'send'
'sender:none'
'sending'
'service.'
'settling'
'skip:l 40'
'snglist'
'spacecraft'
'story'
'subject: '
'subject:Lunar'
'subject:Prospecting'
'subject:Water'
'subject:for'
'subscribed'
'the'
'thirsty'
'this'
'to:2**0'
'to:addr:lyris.msfc.nasa.gov'
'to:addr:snglist'
'to:name:nasa science news'
'unsubscribe'
'url:111417c'
'url:28apr_lro'
'url:gov'
'url:headlines'
'url:htm'
'url:id'
'url:l'
'url:list111417'
'url:lyris'
'url:msfc'
'url:n'
'url:nasa'
'url:podcast'
'url:science'
'url:snglist'
'url:t'
'url:u'
'url:xml'
'url:y2005'
'water.'
'work.'
'worlds'
'x-mailer:none'
'you'


Thanks.

-gowranga

On Mon, 2 May 2005, Tony Meyer wrote:

>
> How did you obtain this list?  It shows the clues for an empty database
> (i.e. you haven't trained any mail).  The clue list will always be empty in
> this case, and all mail will score 0.5.  I presume this isn't the database
> that you're using, though, so this was a problem in obtaining the clues
> list.
>


More information about the Spambayes mailing list