"Brad Clements" wrote
This is one way to do it, but I was planning on experimenting with tokenizer
that strip out HTML tags, leaving only the text.
The set I'm working with, I found I needed to strip out everything but for src="" and href="" attributes of tags. Too much goodness in them for the system to get it's teeth into.
Tells me (spammer hat on) that I can send message with a non-spammish text only part, and a spam html part since most "non-techie" email client users automatically display the html version when available, however Tim's implementation will ignore it.
I've actually got a bunch of spam like that. The text/plain is something like
**This is a HTML message**
and nothing else.