[spambayes-bugs] [ spambayes-Support Requests-798318 ] Blackberry
redirector e-mails identified as (maybe-)SPAM
SourceForge.net
noreply at sourceforge.net
Fri Jul 16 05:21:56 CEST 2004
Support Requests item #798318, was opened at 2003-09-01 12:09
Message generated for change (Comment added) made by anadelonbrin
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498104&aid=798318&group_id=61702
Category: None
Group: None
>Status: Closed
Priority: 5
Submitted By: Benjamin W. Slivka (benslivka)
Assigned to: Nobody/Anonymous (nobody)
Summary: Blackberry redirector e-mails identified as (maybe-)SPAM
Initial Comment:
Hi!.
I installed SpamBayes 0.7 on my Windows XP Pro SP1
system late Thursday evening 8/28. Here it is Sunday
8/31, and SB is working very nicely!
BUT, it doesn't seem to have learned that RIM /
Blackberry Redirector e-mails or non-spam. It pretty
regularly puts them in the "maybe spam" folder.
I've included the detailed "spam report" on one such
message below.
Otherwise, SB is working great with my Blackberry -- if it
identifies an inbox message as Spam it moves it out of
the inbox before the Blackberry Redirector has a chance
to send it to my Blackberry!
Please let me know if I can provide more information.
Thank you!
--Ben Slivka
Ben [at] Slivka [dot] com
Clyde Hill, WA (near Seattle)
Spam Score: 0.207205
word spamprob #ham
#spam
'*H*' 0.663497 - -
'*S*' 0.0779066 - -
'between' 0.0221876 2566 0
'blackberry' 0.0815819 3592 1
'processed' 0.249116 120 0
'move' 0.264701 1577 2
'data' 0.270535 1529 2
'header:Message-Id:1' 0.302488
5573 9
'used' 0.318698 2332 4
'not' 0.37607 14097 32
'reply-to:none' 0.390848 27683 67
'to:addr:ben' 0.60174 18381 105
'header:MIME-Version:1' 0.614831
13414 81
'to:addr:slivka.com' 0.616083 17303
105
'from:no real name:2**0' 0.669564
2714 21
'to:no real name:2**0' 0.687821 12466
104
'header:Return-Path:1' 0.744835 9939
110
'carry' 0.779405 332 5
Message Stream:
X-MS-Mail-Gibberish: Microsoft Mail Internet Headers
Version 2.0
Received: from BlackBerry.NET ([206.51.26.40]) by
janus.slivka.org with
Microsoft SMTPSVC(5.0.2195.6713); Sun, 31
Aug 2003 16:06:27 -0700
Received: from smtprelay01.etp.prod.on.blackberry
(smtprelay01.etp.prod.on.blackberry
[172.16.147.240])
by BlackBerry.NET (8.12.9+Sun/8.12.9) with
ESMTP id h7VN6hvs008489
for <ben at slivka.com>; Sun, 31 Aug 2003
19:06:44 -0400 (EDT)
Received: from etp02.etp.prod.on.blackberry
(etp02.etp.prod.on.blackberry
[172.16.147.237])
by smtprelay01.etp.prod.on.blackberry
(8.12.9/8.12.9) with ESMTP id
h7VMxtAg013832
for <ben at slivka.com>; Sun, 31 Aug 2003
19:02:28 -0400 (EDT)
Date: Sun, 31 Aug 2003 19:02:28 -0400 (EDT)
Message-Id:
<200308312302.h7VMxtAg013832 at smtprelay01.etp.prod.
on.blackberry>
From: etp at etp02.etp.na.blackberry.net
Subject: RIM_bca28a80-e9c0-11d1-87fe-00600811c6a2
To: ben at slivka.com
MIME-Version: 1.0
Content-Type: MULTIPART/mixed; BOUNDARY="-
1824071167-23478-1062370948=:1404"
Return-Path: etp at etp02.etp.na.blackberry.net
X-OriginalArrivalTime: 31 Aug 2003 23:06:28.0062 (UTC)
FILETIME=[8215A7E0:01C37014]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type"
CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange
Server version 6.0.6249.1">
<TITLE>RIM_bca28a80-e9c0-11d1-87fe-
00600811c6a2</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>This message is used to carry data
between the BlackBerry Redirector and BlackBerry
Handheld. Please do not delete, move or respond to this
message - it will be processed by the BlackBerry
Redirector.</FONT></P>
</BODY>
</HTML>
This message is used to carry data between the
BlackBerry Redirector and BlackBerry Handheld. Please do
not delete, move or respond to this message - it will be
processed by the BlackBerry Redirector.
Message Tokens:
47 unique tokens
'and'
'between'
'blackberry'
'carry'
'cc:none'
'content-type:text/plain'
'data'
'delete,'
'from:addr:etp'
'from:addr:etp02.etp.na.blackberry.net'
'from:no real name:2**0'
'handheld.'
'header:Date:1'
'header:From:1'
'header:MIME-Version:1'
'header:Message-Id:1'
'header:Received:3'
'header:Return-Path:1'
'header:Subject:1'
'header:To:1'
'message'
'message-id:@smtprelay01.etp.prod.on.blackberry'
'move'
'not'
'please'
'processed'
'redirector'
'redirector.'
'reply-to:none'
'respond'
'sender:none'
'skip:r 40'
'subject:-'
'subject:00600811c6a2'
'subject:11d1'
'subject:87fe'
'subject:RIM_bca28a80'
'subject:e9c0'
'the'
'this'
'to:2**0'
'to:addr:ben'
'to:addr:slivka.com'
'to:no real name:2**0'
'used'
'will'
'x-mailer:none'
----------------------------------------------------------------------
>Comment By: Tony Meyer (anadelonbrin)
Date: 2004-07-16 15:21
Message:
Logged In: YES
user_id=552329
As per comment, this seems to be fixed.
----------------------------------------------------------------------
Comment By: Benjamin W. Slivka (benslivka)
Date: 2003-09-04 11:47
Message:
Logged In: YES
user_id=856287
Dear Tim_One:
I set "experimental_ham_spam_imbalance_adjustment: False"
and that immediately curred the problem -- the Blackberry e-
mails are staying in the inbox and being processed by the
Blackberry Redirector.
I don't notice any significant increase in spam-as-ham, either!
Thank you and keep up the great work!
ben at slivka.com
www.slivka.com
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2003-09-02 12:01
Message:
Logged In: YES
user_id=31435
It's much simpler for you to keep your training data balanced
than it is for me to add entirely new subsystems to the code
<wink>. Try it! People with balanced training sets don't have
problems like this.
If I had time to throw at making improvements, I'd be much
more interested in finding a better way to deal with
unbalanced training data than fiddling with brittle rule-based
subsystems (they're always brittle, unless backed by complex
learning algorithms to adjust weights based on feedback).
It would help if you at least tried setting
experimental_ham_spam_imbalance_adjustment: False
and reported back on what happened. You don't have to
retrain after doing that. Your original complaint should go
away then. More interesting is what new problems would
arise.
----------------------------------------------------------------------
Comment By: Benjamin W. Slivka (benslivka)
Date: 2003-09-01 15:25
Message:
Logged In: YES
user_id=856287
Dear Tim_One,
I have tons of ham (29,000+) and barely any spam (now up
to 189 from 80 when I first trained SpamBayes).
Have you considered having some additional filter rules that
would allow me to indicate that mail from certain domains
and/or with certain reg_exp subject lines was SPAM/NOT-
SPAM?
I understand this would violate the "pure" Bayesian approach,
but might be a simple, practical solution?
Thanks!
--ben at slivka.com
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2003-09-01 13:35
Message:
Logged In: YES
user_id=31435
Looks like you trained on a great many more ham msgs than
spam msgs. If so, try training on an approximately equal
number of each. The code was designed, tested, and tuned
by people who trained on approximately equal numbers of
each, and we don't have yet have a good approach to dealing
with wildly unbalanced training sets.
You can find your default_bayes_customize.ini file and change
the line
experimental_ham_spam_imbalance_adjustment: True
to
experimental_ham_spam_imbalance_adjustment: False
and then these msgs will almost certainly be scored as solid
ham. However, you'll probably get a much higher false
negative rate (spam erroneously classified as ham) then too.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498104&aid=798318&group_id=61702
More information about the Spambayes-bugs
mailing list