[spambayes-bugs] [ spambayes-Support Requests-798318 ] Blackberry redirector e-mails identified as (maybe-)SPAM

SourceForge.net noreply at sourceforge.net
Fri Jul 16 05:21:56 CEST 2004


Support Requests item #798318, was opened at 2003-09-01 12:09
Message generated for change (Comment added) made by anadelonbrin
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498104&aid=798318&group_id=61702

Category: None
Group: None
>Status: Closed
Priority: 5
Submitted By: Benjamin W. Slivka (benslivka)
Assigned to: Nobody/Anonymous (nobody)
Summary: Blackberry redirector e-mails identified as (maybe-)SPAM

Initial Comment:
Hi!.

I installed SpamBayes 0.7 on my Windows XP Pro SP1 
system late Thursday evening 8/28.  Here it is Sunday 
8/31, and SB is working very nicely!

BUT, it doesn't seem to have learned that RIM / 
Blackberry Redirector e-mails or non-spam.  It pretty 
regularly puts them in the "maybe spam" folder.

I've included the detailed "spam report" on one such 
message below.

Otherwise, SB is working great with my Blackberry -- if it 
identifies an inbox message as Spam it moves it out of 
the inbox before the Blackberry Redirector has a chance 
to send it to my Blackberry!

Please let me know if I can provide more information.

Thank you!
--Ben Slivka
Ben [at] Slivka [dot] com
Clyde Hill, WA (near Seattle)

Spam Score: 0.207205


word                                spamprob         #ham  
#spam
'*H*'                               0.663497            -      -
'*S*'                               0.0779066           -      -
'between'                           0.0221876        2566      0
'blackberry'                        0.0815819        3592      1
'processed'                         0.249116          120      0
'move'                              0.264701         1577      2
'data'                              0.270535         1529      2
'header:Message-Id:1'               0.302488         
5573      9
'used'                              0.318698         2332      4
'not'                               0.37607         14097     32
'reply-to:none'                     0.390848        27683     67
'to:addr:ben'                       0.60174         18381    105
'header:MIME-Version:1'             0.614831        
13414     81
'to:addr:slivka.com'                0.616083        17303    
105
'from:no real name:2**0'            0.669564         
2714     21
'to:no real name:2**0'              0.687821        12466    
104
'header:Return-Path:1'              0.744835         9939    
110
'carry'                             0.779405          332      5
Message Stream:


X-MS-Mail-Gibberish: Microsoft Mail Internet Headers 
Version 2.0
Received: from BlackBerry.NET ([206.51.26.40]) by 
janus.slivka.org with
	Microsoft SMTPSVC(5.0.2195.6713); Sun, 31 
Aug 2003 16:06:27 -0700
Received: from smtprelay01.etp.prod.on.blackberry
	(smtprelay01.etp.prod.on.blackberry 
[172.16.147.240])
	by BlackBerry.NET (8.12.9+Sun/8.12.9) with 
ESMTP id h7VN6hvs008489
	for <ben at slivka.com>; Sun, 31 Aug 2003 
19:06:44 -0400 (EDT)
Received: from etp02.etp.prod.on.blackberry 
(etp02.etp.prod.on.blackberry
	[172.16.147.237])
	by smtprelay01.etp.prod.on.blackberry 
(8.12.9/8.12.9) with ESMTP id
	h7VMxtAg013832
	for <ben at slivka.com>; Sun, 31 Aug 2003 
19:02:28 -0400 (EDT)
Date: Sun, 31 Aug 2003 19:02:28 -0400 (EDT)
Message-Id: 
<200308312302.h7VMxtAg013832 at smtprelay01.etp.prod.
on.blackberry>
From: etp at etp02.etp.na.blackberry.net
Subject: RIM_bca28a80-e9c0-11d1-87fe-00600811c6a2
To: ben at slivka.com
MIME-Version: 1.0
Content-Type: MULTIPART/mixed; BOUNDARY="-
1824071167-23478-1062370948=:1404"
Return-Path: etp at etp02.etp.na.blackberry.net
X-OriginalArrivalTime: 31 Aug 2003 23:06:28.0062 (UTC)
	FILETIME=[8215A7E0:01C37014]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" 
CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange 
Server version 6.0.6249.1">
<TITLE>RIM_bca28a80-e9c0-11d1-87fe-
00600811c6a2</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=2>This message is used to carry data 
between the BlackBerry Redirector and BlackBerry 
Handheld. Please do not delete, move or respond to this 
message - it will be processed by the BlackBerry 
Redirector.</FONT></P>

</BODY>
</HTML>
This message is used to carry data between the 
BlackBerry Redirector and BlackBerry Handheld. Please do 
not delete, move or respond to this message - it will be 
processed by the BlackBerry Redirector.
Message Tokens:

47 unique tokens

'and'
'between'
'blackberry'
'carry'
'cc:none'
'content-type:text/plain'
'data'
'delete,'
'from:addr:etp'
'from:addr:etp02.etp.na.blackberry.net'
'from:no real name:2**0'
'handheld.'
'header:Date:1'
'header:From:1'
'header:MIME-Version:1'
'header:Message-Id:1'
'header:Received:3'
'header:Return-Path:1'
'header:Subject:1'
'header:To:1'
'message'
'message-id:@smtprelay01.etp.prod.on.blackberry'
'move'
'not'
'please'
'processed'
'redirector'
'redirector.'
'reply-to:none'
'respond'
'sender:none'
'skip:r 40'
'subject:-'
'subject:00600811c6a2'
'subject:11d1'
'subject:87fe'
'subject:RIM_bca28a80'
'subject:e9c0'
'the'
'this'
'to:2**0'
'to:addr:ben'
'to:addr:slivka.com'
'to:no real name:2**0'
'used'
'will'
'x-mailer:none'


----------------------------------------------------------------------

>Comment By: Tony Meyer (anadelonbrin)
Date: 2004-07-16 15:21

Message:
Logged In: YES 
user_id=552329

As per comment, this seems to be fixed.

----------------------------------------------------------------------

Comment By: Benjamin W. Slivka (benslivka)
Date: 2003-09-04 11:47

Message:
Logged In: YES 
user_id=856287

Dear Tim_One:

I set "experimental_ham_spam_imbalance_adjustment: False" 
and that immediately curred the problem -- the Blackberry e-
mails are staying in the inbox and being processed by the 
Blackberry Redirector.

I don't notice any significant increase in spam-as-ham, either!

Thank you and keep up the great work!
ben at slivka.com
www.slivka.com


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-09-02 12:01

Message:
Logged In: YES 
user_id=31435

It's much simpler for you to keep your training data balanced 
than it is for me to add entirely new subsystems to the code 
<wink>.  Try it!  People with balanced training sets don't have 
problems like this.

If I had time to throw at making improvements, I'd be much 
more interested in finding a better way to deal with 
unbalanced training data than fiddling with brittle rule-based 
subsystems (they're always brittle, unless backed by complex 
learning algorithms to adjust weights based on feedback).

It would help if you at least tried setting

experimental_ham_spam_imbalance_adjustment: False

and reported back on what happened.  You don't have to 
retrain after doing that.  Your original complaint should go 
away then.  More interesting is what new problems would 
arise.

----------------------------------------------------------------------

Comment By: Benjamin W. Slivka (benslivka)
Date: 2003-09-01 15:25

Message:
Logged In: YES 
user_id=856287

Dear Tim_One,

I have tons of ham (29,000+) and barely any spam (now up 
to 189 from 80 when I first trained SpamBayes).

Have you considered having some additional filter rules that 
would allow me to indicate that mail from certain domains 
and/or with certain reg_exp subject lines was SPAM/NOT-
SPAM?

I understand this would violate the "pure" Bayesian approach, 
but might be a simple, practical solution?

Thanks!
--ben at slivka.com

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-09-01 13:35

Message:
Logged In: YES 
user_id=31435

Looks like you trained on a great many more ham msgs than 
spam msgs.  If so, try training on an approximately equal 
number of each.  The code was designed, tested, and tuned 
by people who trained on approximately equal numbers of 
each, and we don't have yet have a good approach to dealing 
with wildly unbalanced training sets.

You can find your default_bayes_customize.ini file and change 
the line

experimental_ham_spam_imbalance_adjustment: True

to

experimental_ham_spam_imbalance_adjustment: False

and then these msgs will almost certainly be scored as solid 
ham.  However, you'll probably get a much higher false 
negative rate (spam erroneously classified as ham) then too.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498104&aid=798318&group_id=61702


More information about the Spambayes-bugs mailing list