> i remember seeing this simple python function which would take raw html > and output the content (body?) of the page as plain text (no <..> tags > etc) http://www.aaronsw.com/2002/html2text/