RE: [Spambayes] trained as ham, classified as spam
I have trained the forwarded message as ham, and fed it to sb_filter.py afterwards. It is still classified as spam! Does this mean that looks like ham, but looks much more like spam? Is there a way to have this false positive being classified as ham?
More useful than the message itself would be the clues that it generates with your database. You can get these via the web interface (the classify message box). You should be able to see why it's being classified as spam (i.e. which tokens are making the difference), but you can post them here, too, and someone is bound to comment. It would also help to know how much data you have fed to spambayes - note that it works best with roughly equal numbers of ham and spam, so if this isn't the case, then that might be the problem. =Tony Meyer
Meyer, Tony wrote:
Subject: RE: [Spambayes] trained as ham, classified as spam Date: Thu, 2 Oct 2003 01:40:48 +0200
I have trained the forwarded message as ham, and fed it to sb_filter.py afterwards. It is still classified as spam! Does this mean that looks like ham, but looks much more like spam? Is there a way to have this false positive being classified as ham?
More useful than the message itself would be the clues that it generates with your database. You can get these via the web interface (the classify message box).
Ah, this is indeed useful. I did not know this was possible, I thought the web interface was only for pop. I have been able to asnwer the question for myself now: it was a message which, by my ISP, was unjustly classified to contain a virus, and all messages which were justly classified as such have been trained as spam. Actually quite logical, now I think about it.
You should be able to see why it's being classified as spam (i.e. which tokens are making the difference), but you can post them here, too, and someone is bound to comment. It would also help to know how much data you have fed to spambayes - note that it works best with roughly equal numbers of ham and spam, so if this isn't the case, then that might be the problem.
Well, it is roughly equal, although I don't know how roughly is roughly. I trained with a little more spam than ham: probably approximately 60% spam. Maybe it would be the best to restart with training, because although ham and spam have been evenly spaced, a lot of trained spam was similar. I guess I have to learn to work with spambayes better. Thanks for your answer! Gerrit. -- 204. If a freed man strike the body of another freed man, he shall pay ten shekels in money. -- 1780 BC, Hammurabi, Code of Law -- Asperger Syndroom - een persoonlijke benadering: http://people.nl.linux.org/~gerrit/ Het zijn tijden om je zelf met politiek te bemoeien: http://www.sp.nl/
participants (2)
-
Gerrit Holl -
Meyer, Tony