[Spambayes] Re: Deleting spam from the server using only the headers
David Mertz, Ph.D.
mertz at gnosis.cx
Tue Sep 23 21:00:18 EDT 2003
William Trenker <wtrenker at shaw.ca> wrote:
|Well, I just noticed that over on python-list at python.org, in a message
|titled _Re: pop3 email header classifier?_, David Mertz has pointed out
|some research he did a year ago on applying statistical methods for
|detecting spam in email headers. The innovation he implemented was to
|break the headers up into trigrams (sequences of three characters) and
|statistically look for suspicious patterns among the trigrams in the
|headers. David's article on IBM developerWorks
|(http://www-106.ibm.com/developerworks/linux/library/l-spamf.html)
|rovides more details and a link to his prototype Python code.
I am indeed happy with my approach. And even fairly confident that the
trigram model will do better for headers-only than will the word model.
That said, a year ago when I wrote the article mentioned, Spambayes was
in its infancy, and I did not test it. I have not followed the work
with Spambayes closely--but I have followed it enough to know that Tim
Peters and others have done quite a lot of work exploring variations of
statistical models. I am quite certain Spambayes is quite a lot better
than the naive Bayesian stuff I do (with simplified weighting rules
yet). And I even seem to recall reading that Spambayes had some N-gram
options in there, or at least experimented with that.
Still, it would be nice to have a good, friendly, system to do basically
what my homebrew 'spamfilter.py' does. That is, run periodically, check
only headers, and delete obviously spammy messages without ever
downloading.
Yours, David...
--
mertz@ | The specter of free information is haunting the `Net! All the
gnosis | powers of IP- and crypto-tyranny have entered into an unholy
.cx | alliance...ideas have nothing to lose but their chains. Unite
| against "intellectual property" and anti-privacy regimes!
-------------------------------------------------------------------------
More information about the Spambayes
mailing list