HTML to formatted text conversion function

Rupert Scammell rupe at
Tue Jul 24 16:16:45 EDT 2001

Recently I've been using a call like os.system("/usr/bin/lynx -dump > /tmp/site-text.txt") to grab formatted text
versions of pages (without HTML) for subsequent processing.  However,
I don't like the fact that this technique introduces an additional
dependency into my code (lynx). I was wondering if anyone could
recommend an equivalent Python function or module that lets me do this
without introducing a platform specific dependency?

urllib.urlretrieve() gets back the raw HTML page, so it's not really
helpful to me, except as a starting point for processing.

Thanks in advance,

Rupert Scammell
rupe at

More information about the Python-list mailing list