[Spambayes] date for new release to handle image spam?

skip at pobox.com skip at pobox.com
Fri Feb 2 16:23:00 CET 2007


    Seth> The word salad they use to drown out significant clues generally
    Seth> fails, but if they throw enough words at it, they sometimes dilute
    Seth> the spam clues sufficiently.  The fact that they throw hundreds of
    Seth> "noise" words at the filters for every spam clue they want to hide
    Seth> and Bayesian filters still catch half or three-quarters of it
    Seth> shows how powerful the Bayesian approach really is....

Hmmm... Could we do something to measure the amount of word salad without
penalizing large non-image emails?

    Seth> - zombie hosts tend to be weak on SMTP etiquette, so one clue is
    Seth>   that they often fail to wait when asked; making the SMTP client
    Seth>   wait for 30 seconds before sending the "connect banner" often
    Seth>   tricks impatient zombies into spewing, and you can then hang up;

Yeah, but this is a job for postgrey and other similar tools.

    Seth> - legitimate mail systems tend to have static IP's with properly
    Seth>   configured reverse DNS that matches their forward DNS; zombies
    Seth>   tend to have either no reverse DNS, or PTR records that do not
    Seth>   match their A records, and their forward DNS is often dynamic;

This is maybe something we can work with.  SB could, in theory, check for
(some of) these DNS properties in addresses it finds in the Received:
headers.  (I suppose Outlook mangles this information as well though.)

    Seth> - legitimate mail systems generally identify themselves at the
    Seth>   beginning of the SMTP conversation with a legitimate host name;
    Seth>   zombies often try to use one of your host names, hoping to make
    Seth>   you think you are talking to a local host on your own network,
    Seth>   or a host name like "fred" that does not resolve to an IP
    Seth>   address;

Again, this is an MTA-level operation.  I'm interested in finding more
things SB can do to classify email that gets by the MTA.

    Seth> I don't know if you've played with rule-based spam filters that
    Seth> use word lists and regular expressions, but it's an interesting
    Seth> exercise and surprising how often our intuition is wrong.

It's been several years, but before SpamBayes I did use SpamAssassin.

Maybe we should move this discussion to spambayes-dev.  I suspect we've put
many of the users/non-developers to sleep by now.

Skip


More information about the SpamBayes mailing list