HTML to formatted text conversion function

Rupert Scammell rupe at metro.yak.net
Tue Jul 24 22:16:45 CEST 2001


Recently I've been using a call like os.system("/usr/bin/lynx -dump
http://www.sample.com > /tmp/site-text.txt") to grab formatted text
versions of pages (without HTML) for subsequent processing.  However,
I don't like the fact that this technique introduces an additional
dependency into my code (lynx). I was wondering if anyone could
recommend an equivalent Python function or module that lets me do this
without introducing a platform specific dependency?

urllib.urlretrieve() gets back the raw HTML page, so it's not really
helpful to me, except as a starting point for processing.

Thanks in advance,

Rupert Scammell
rupe at metro.yak.net
http://metro.yak.net



More information about the Python-list mailing list