[spambayes-bugs] [ spambayes-Feature Requests-943116 ] White list
for domains/email addresses
SourceForge.net
noreply at sourceforge.net
Thu Apr 29 15:31:53 EDT 2004
Feature Requests item #943116, was opened at 2004-04-27 09:04
Message generated for change (Comment added) made by darklaser
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=943116&group_id=61702
Category: pop3proxy
Group: None
Status: Open
Priority: 5
Submitted By: DarkLaser (darklaser)
Assigned to: Nobody/Anonymous (nobody)
Summary: White list for domains/email addresses
Initial Comment:
A nice feature would be to have a domain/email
address white list where you could specify email
addresses which should be marked as ham without
regard to content. It would also be nice to be able to
say anything from the domain belonging to the company
I work for should also be marked as ham regardless of
content.
Anyway, my 2bits.
Thanks,
David
----------------------------------------------------------------------
>Comment By: DarkLaser (darklaser)
Date: 2004-04-29 12:31
Message:
Logged In: YES
user_id=1030399
Yes I had stored up spam from the last 8 months incase I
came accross a bayesien filter I wanted to train it with, but it
all came from this one account. The near 5,000 valid email
are all my valid email for this account over the last 5 years.
So it should have worked very well I would have thought.
Perhaps SpamBayes learns from initial tranning sets differently
than from email it processes as it comes in. So perhaps the
solution is to remove the past training and just start training
from scratch.
Yes, I'll wait for a few more false positives, and I'll post them
before wiping it out and starting over.
David
----------------------------------------------------------------------
Comment By: Kenny Pitt (kpitt)
Date: 2004-04-29 11:49
Message:
Logged In: YES
user_id=859086
It's very unusual to hear from someone who is getting
accuracy this poor. Could you upload a copy of the spam
clues for a false positive message (before training on it)?
Seeing why SpamBayes thought the way it did when it first
processed the message would help a lot.
I notice that you have far more training data than you have
messages that have been processed by SpamBayes, so I
assume you had a large initial training set. Is it possible that
your training data was not representative of the messages
that you are currently receiving?
Although there is no proven best training strategy, in general
SpamBayes seems to perform best if you initially train with
only 5 or 10 of each type of message and then train it up on
your current message stream instead of training it on lots of
outdated messages. You'll also find that SpamBayes is more
responsive to training of new messages when you have fewer
messages in the training database. With the large number of
messages that you have, it will take a *LOT* of training to
overcome existing clues.
----------------------------------------------------------------------
Comment By: DarkLaser (darklaser)
Date: 2004-04-29 06:46
Message:
Logged In: YES
user_id=1030399
Anadelonbrin, thanks for the url. I had looked for something
about white lists, but couldnt find it.
I maintain that a white list would be useful. Perhaps not to
some, but very much so for others. I have received maybe 1
or 2 spam claiming to be from someone on the domain for the
company I work for in the last year, and never have received
any claiming to be from any of my 4 personal domains.
However, the current false positive ratio is horrible. Try 85%
of my good email is falsely being marked as spam. Look at
the number of emails I have trained, with that many trained, I
should be getting near perfect results.
-------------------------------------------------
Total emails trained: Spam: 9728 Ham: 4939
SpamBayes has processed 546 messages - 4 (1%) good, 538
(99%) spam and 4 (0%) unsure.
29 messages were manually classified as good (23 were false
positives).
517 messages were manually classified as spam (0 were false
negatives).
2 unsure messages were manually identified as good, and 2 as
spam.
-------------------------------------------------
Ignoring the unsure messages, out of 27 good emails, 4 were
actually marked as good and 23 as spam. That is ridiculous.
Perhaps I need to change something in my settings, but the
majority of those good emails are from this one domain, so in
my case a white list would make a world of difference. If one
or two spam a year get through because of a white list, no
biggie, I can handle that. Thats a lot easier than having to
go manually remove the word 'spam,' from the subject of 85%
of my email.
I dont have any experience with python (Im a perl man
myself), otherwise I would look at building a white list to send
to the project manager. Anyway, I still think this item should
remain on the wish list.
Thanks,
David
----------------------------------------------------------------------
Comment By: Tony Meyer (anadelonbrin)
Date: 2004-04-28 18:45
Message:
Logged In: YES
user_id=552329
Please see FAQ 6.6:
<http://spambayes.org/faq.html#why-don-t-you-add-whitelisting-blacklisting-to-spambayes>
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=943116&group_id=61702
More information about the Spambayes-bugs
mailing list