[Spambayes] Whitelist for SpamBayes for Outlook
seant at iname.com
Fri Sep 12 15:09:01 EDT 2003
Dave is using SpamAtBay, which accounts for the differences in display.
The statistics are heavily weighted by the 30K starter set that is included
in SAB for people who don't have much to train from and are in a hurry.
That is, everyone.
We preset thresholds to 10 and 90 by default.
The next version of SAB (1.1) will have a much lighter set of statistics
that are derived from more message -- these adapt significantly faster.
I ran the SpamAssassin corpus on these new models with a "correct every
training paradigm. It got _very_ good results.
> -----Original Message-----
> From: spambayes-bounces at python.org
> [mailto:spambayes-bounces at python.org] On Behalf Of Tim Peters
> Sent: Friday, September 12, 2003 2:41 PM
> To: David Matos
> Cc: spambayes at python.org
> Subject: RE: [Spambayes] Whitelist for SpamBayes for Outlook
> [David Matos]
> > Thanks for the reply, Skip! Here are the spam clues for an e-mail
> How did you get these? The subject line here contains
> "Outlook", and this
> looks vaguely like an Outlook addin "Show spam clues for
> current message"
> report, but there are many differences from how that report
> looks in any
> reasonably recent version of the addin. So if you're using
> an old version
> of the addin, the first thing to do is to update to a current version.
> > that is representative of the kind that I keep seeing filtered as
> > "Unsure," despite training lots of messages from this sender:
> > Spam score: 12.55%
> That would have scored as ham for me. What do you have your
> ham cutoff set
> to? My ham cutoff is set to 20, and my spam cutoff to 80.
> If most of the
> msgs from this dude score under 20, boosting your ham cutoff
> to 20 will
> solve most of your problem.
> > Word Probability # Good # Spam
> > '*H*' 99.97% - -
> > '*S*' 25.07% - -
> > 'from:addr:adjoined.com' 0.42% 53 0
> > 'from:addr:operez' 0.42% 53 0
> > 'from:name:oscar perez' 0.42% 53 0
> > 'message-id:@smmia001.adjoined.net' 0.43% 52 0
> There end all the clues that your friend sent the message.
> Enabling the
> mine-received-headers option would generate more. The rest
> of his message
> consists of a URL, so there's not much else to go on, and
> what there is
> looks more spammy than not (relative to everything else
> you've trained on).
> > 'url:story2' 5.06% 4 0
> > 'subject:Goes' 6.52% 3 0
> > 'subject:Yahoo' 6.79% 16 1
> > 'url:story' 11.58% 38 5
> > 'subject:News' 13.23% 133 21
> > 'url:ncid' 15.52% 1 0
> > 'url:tmpl' 18.96% 13 3
> > 'url:news' 20.12% 183 48
> > 'header:Received:1' 29.01% 560 239
> > 'proto:http' 61.93% 9835 16725
> > 'x-mailer:none' 62.77% 5776 10180
> > 'subject:; ' 65.23% 31 61
> > 'url:com' 65.87% 6350 12812
> > 'url:u' 72.92% 164 462
> > 'url:yahoo' 73.20% 382 1091
> > 'header:Message-ID:1' 73.53% 98 285
> > 'url:cid' 74.11% 55 165
> > 'url:' 74.96% 1368 4282
> > 'subject:! ' 89.32% 87 762
> > 'url:eo' 99.78% 0 102
> > Message text
> > Received: from mail.adjoined.com ([18.104.22.168])
> > by rwcrmxc11.comcast.net (rwcrmxc11) with ESMTP
> > id <20030912123209r1100769pqe>; Fri, 12 Sep 2003 12:32:09
> > +0000 X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0
> > content-class: urn:content-classes:message
> > MIME-Version: 1.0
> > Subject: Yahoo! News - Batman Goes Psycho; Bale Cast
> > Date: Fri, 12 Sep 2003 08:32:07 -0400
> > Message-ID:
> > <7F1B2E8605ED6B4EA45F13F991588A3D033B998F at smmia001.adjoined.net>
> > X-MS-Has-Attach: yes X-MS-TNEF-Correlator:
> > Thread-Topic: Yahoo! News - Batman Goes Psycho; Bale Cast
> > Thread-Index: AcN5KeDy9XDeto2eSlaMzzKMVyFNPA==
> > From: "Oscar Perez" <operez at adjoined.com>
> > To: <matos at attbi.com>
> > <<Yahoo! News - Batman Goes Psycho; Bale Cast.url>>
> http://story.news.yahoo.com/news?tmpl=story2> &cid=794&u=/eo/12487&ncid=
> If this is ham to you, it may be the best argument for a
> whitelist I've seen
> Spambayes at python.org
> Check the FAQ before asking: http://spambayes.sf.net/faq.html
More information about the Spambayes