[Python-Dev] Getting started with GBayes testing

Anthony Baxter Anthony Baxter <anthony@interlink.com.au>
Fri, 06 Sep 2002 00:28:25 +1000

>>> "Brad Clements" wrote
> This is one way to do it, but I was planning on experimenting with tokenizer 
> that strip out HTML tags, leaving only the text. 

The set I'm working with, I found I needed to strip out everything 
but for src="" and href="" attributes of tags. Too much goodness in
them for the system to get it's teeth into.

> Tells me (spammer hat on) that I can send message with a non-spammish text 
> only part, and a spam html part since most "non-techie" email client users 
> automatically display the html version when available, however Tim's 
> implementation will ignore it.

I've actually got a bunch of spam like that. The text/plain is something

**This is a HTML message** 

and nothing else.

