[spambayes-dev] saving attachments

Seth Goodman sethg at GoodmanAssociates.com
Mon Mar 8 14:41:38 EST 2004


I have been accumulating a message corpus for testing that is now
becoming alarmingly large.  My cup doth runneth over.  AFAIK, SpamBayes
does nothing with attachments.  Neither the existence of one nor its
name, size or contents are considered.  While most of the spam in my
corpus is attachment-free, the ham has lots of them and many are quite
large (engineering drawing packages for review).  It would reduce the
size of the corpus .pst file considerably if I could delete all
attachments.  I have an inexpensive commercial tool that can do this,
however, I don't want to if anyone is considering using attachments in
future versions.

FWIW, I don't see attachments as having much potential for spam
detection.  The number of tokens could easily dwarf the original message
and need not be related to it in any way.  The last thing we want to do
is to encourage spammers to tack on huge attachments, though word salad
attacks have been totally ineffective on my machine and most others who
mentioned it on this list.  However, including the full text of actual
natural language works might have better luck, and I wouldn't want to be
responsible for encouraging that practice, i.e. really bad Karma, hate
mail and death threats, so I would think that continuing to ignore
attachments is a good strategy.

So, have there been any rumblings about possibly using attachment
information?  Am I reasonably safe in deleting all the attachments in my
message corpus for the foreseeable future?

--

Seth Goodman




More information about the spambayes-dev mailing list