Extracting text from html

VanL van.lindberg at gmail.com
Mon Aug 3 20:42:14 EDT 2009


Hello all,

Does anyone know of a good tool to get a minimally-formatted text 
document out of an html document? Something along the lines of what you 
would get with a lynx -dump, but in Python.

I have lxml installed, so I can roll my own if I need to. However, this 
seemed like the sort of thing that someone would have solved already.

Thanks,

Van




More information about the Python-list mailing list