[spambayes-bugs] [ spambayes-Feature Requests-943116 ] White list for domains/email addresses

Thu Apr 29 15:31:53 EDT 2004

Feature Requests item #943116, was opened at 2004-04-27 09:04
Message generated for change (Comment added) made by darklaser
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=943116&group_id=61702

Category: pop3proxy
Group: None
Status: Open
Priority: 5
Submitted By: DarkLaser (darklaser)
Assigned to: Nobody/Anonymous (nobody)
Summary: White list for domains/email addresses

Initial Comment:
      A nice feature would be to have a domain/email 
address white list where you could specify email 
addresses which should be marked as ham without 
regard to content.  It would also be nice to be able to 
say anything from the domain belonging to the company 
I work for should also be marked as ham regardless of 
content.

Anyway, my 2bits.

Thanks,
David

----------------------------------------------------------------------

>Comment By: DarkLaser (darklaser)
Date: 2004-04-29 12:31

Message:
Logged In: YES 
user_id=1030399

Yes I had stored up spam from the last 8 months incase I 
came accross a bayesien filter I wanted to train it with, but it 
all came from this one account.  The near 5,000 valid email 
are all my valid email for this account over the last 5 years.  
So it should have worked very well I would have thought.

Perhaps SpamBayes learns from initial tranning sets differently 
than from email it processes as it comes in.  So perhaps the 
solution is to remove the past training and just start training 
from scratch.  

Yes, I'll wait for a few more false positives, and I'll post them 
before wiping it out and starting over.

David

----------------------------------------------------------------------

Comment By: Kenny Pitt (kpitt)
Date: 2004-04-29 11:49

Message:
Logged In: YES 
user_id=859086

It's very unusual to hear from someone who is getting 
accuracy this poor.  Could you upload a copy of the spam 
clues for a false positive message (before training on it)?  
Seeing why SpamBayes thought the way it did when it first 
processed the message would help a lot.

I notice that you have far more training data than you have 
messages that have been processed by SpamBayes, so I 
assume you had a large initial training set.  Is it possible that 
your training data was not representative of the messages 
that you are currently receiving?

Although there is no proven best training strategy, in general 
SpamBayes seems to perform best if you initially train with 
only 5 or 10 of each type of message and then train it up on 
your current message stream instead of training it on lots of 
outdated messages.  You'll also find that SpamBayes is more 
responsive to training of new messages when you have fewer 
messages in the training database.  With the large number of 
messages that you have, it will take a *LOT* of training to 
overcome existing clues.

----------------------------------------------------------------------

Comment By: DarkLaser (darklaser)
Date: 2004-04-29 06:46

Message:
Logged In: YES 
user_id=1030399

Anadelonbrin, thanks for the url.  I had looked for something 
about white lists, but couldn’t find it.  

I maintain that a white list would be useful.  Perhaps not to 
some, but very much so for others.  I have received maybe 1 
or 2 spam claiming to be from someone on the domain for the 
company I work for in the last year, and never have received 
any claiming to be from any of my 4 personal domains.  
However, the current false positive ratio is horrible.  Try 85% 
of my good email is falsely being marked as spam.  Look at 
the number of emails I have trained, with that many trained, I 
should be getting near perfect results.
-------------------------------------------------
Total emails trained: Spam: 9728 Ham: 4939
SpamBayes has processed 546 messages - 4 (1%) good, 538 
(99%) spam and 4 (0%) unsure.
29 messages were manually classified as good (23 were false 
positives).
517 messages were manually classified as spam (0 were false 
negatives).
2 unsure messages were manually identified as good, and 2 as 
spam.
-------------------------------------------------
Ignoring the unsure messages, out of 27 good emails, 4 were 
actually marked as good and 23 as spam.  That is ridiculous.  
Perhaps I need to change something in my settings, but the 
majority of those good emails are from this one domain, so in 
my case a white list would make a world of difference.  If one 
or two spam a year get through because of a white list, no 
biggie, I can handle that.  That’s a lot easier than having to 
go manually remove the word 'spam,' from the subject of 85% 
of my email.  

I don’t have any experience with python (I’m a perl man 
myself), otherwise I would look at building a white list to send 
to the project manager.  Anyway, I still think this item should 
remain on the wish list.

Thanks,
David

----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2004-04-28 18:45

Message:
Logged In: YES 
user_id=552329

Please see FAQ 6.6:

<http://spambayes.org/faq.html#why-don-t-you-add-whitelisting-blacklisting-to-spambayes>

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=943116&group_id=61702