[Spambayes] date for new release to handle image spam?
skip at pobox.com
skip at pobox.com
Fri Feb 2 16:23:00 CET 2007
Seth> The word salad they use to drown out significant clues generally
Seth> fails, but if they throw enough words at it, they sometimes dilute
Seth> the spam clues sufficiently. The fact that they throw hundreds of
Seth> "noise" words at the filters for every spam clue they want to hide
Seth> and Bayesian filters still catch half or three-quarters of it
Seth> shows how powerful the Bayesian approach really is....
Hmmm... Could we do something to measure the amount of word salad without
penalizing large non-image emails?
Seth> - zombie hosts tend to be weak on SMTP etiquette, so one clue is
Seth> that they often fail to wait when asked; making the SMTP client
Seth> wait for 30 seconds before sending the "connect banner" often
Seth> tricks impatient zombies into spewing, and you can then hang up;
Yeah, but this is a job for postgrey and other similar tools.
Seth> - legitimate mail systems tend to have static IP's with properly
Seth> configured reverse DNS that matches their forward DNS; zombies
Seth> tend to have either no reverse DNS, or PTR records that do not
Seth> match their A records, and their forward DNS is often dynamic;
This is maybe something we can work with. SB could, in theory, check for
(some of) these DNS properties in addresses it finds in the Received:
headers. (I suppose Outlook mangles this information as well though.)
Seth> - legitimate mail systems generally identify themselves at the
Seth> beginning of the SMTP conversation with a legitimate host name;
Seth> zombies often try to use one of your host names, hoping to make
Seth> you think you are talking to a local host on your own network,
Seth> or a host name like "fred" that does not resolve to an IP
Seth> address;
Again, this is an MTA-level operation. I'm interested in finding more
things SB can do to classify email that gets by the MTA.
Seth> I don't know if you've played with rule-based spam filters that
Seth> use word lists and regular expressions, but it's an interesting
Seth> exercise and surprising how often our intuition is wrong.
It's been several years, but before SpamBayes I did use SpamAssassin.
Maybe we should move this discussion to spambayes-dev. I suspect we've put
many of the users/non-developers to sleep by now.
Skip
More information about the SpamBayes
mailing list