[Spambayes] train on blank spam messages

Ryan Malayter rmalayter at bai.org
Wed Feb 18 09:17:53 EST 2004


I frequently get entirely blank messages (no subject or body, sometimes
even without FROM or TO address). These are obviously abortive spam
attempts, generated by buggy spam ware. 
 
Should I train on these? I have been, figuring that SpamBayes could at
least generate subject:None tokens, and perhaps something from the
Received headers. Although, I notice that SpamBayes doesn't mine the
class-B or class-C network from the Received header. Has this been
tried? Or is it useless in this day of spam-spewing, compromised home
machines? That a message came directly from a machine on PacBell's DSL
network, rather than a well-known PacBell SMTP relay, would seem to be a
fairly strong spam clue to me.
 
Here's an example:
-----------------------------
Combined Score: 26% (0.258597)
Internal ham score (*H*): 0.829696
Internal spam score (*S*): 0.346891

# ham trained on: 3634
# spam trained on: 3146

7 Significant Tokens
token                               spamprob         #ham  #spam
'from:none'                         0.013024          367      4
'message-id:invalid'                0.0680032         367     23
'reply-to:none'                     0.335768         3028   1325
'cc:none'                           0.623476         2025   2903
'header:Received:2'                 0.745428          360    913
'to:none'                           0.755966           14     38
'sender:none'                       0.767726         1057   3025

Message Stream
X-MS-Mail-Gibberish: Microsoft Mail Internet Headers Version 2.0
Received: from adsl-67-125-217-122.dsl.lsan03.pacbell.net
([67.125.217.122])
	by smtp.bai.org with Microsoft SMTPSVC(5.0.2195.6713); 
	Wed, 18 Feb 2004 00:35:23 -0600
Received: from 44.54.208.104 by 67.125.217.122; Wed, 18 Feb 2004
07:37:38 +0100
Message-ID: <E[20


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7036.0">
<TITLE></TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

</BODY>
</HTML>
All Message Tokens
10 unique tokens

'cc:none'
'content-type:text/plain'
'from:none'
'header:Message-ID:1'
'header:Received:2'
'message-id:invalid'
'reply-to:none'
'sender:none'
'to:none'
'x-mailer:none'



More information about the Spambayes mailing list