Python equivalent of "lynx -dump"?
kputland at servicemagic.com
Thu Mar 30 23:27:08 CEST 2000
<ben at co.and.co> wrote in message
news:CDQD4.44133$ds6.91068 at afrodite.telenet-ops.be...
> lewst <lewst at yahoo.com> wrote:
> > I'm looking for a functional equivalent of the "-dump" option to the
> > lynx web-browser in Python. "-dump" dumps the formatted output of an
> > HTML document.
> > Right now I have a python program that captures the output of a
> > webpage and prints it like so:
> > lynxcmd = "lynx -dump %s" %url
> > data = os.popen(lynxcmd).read()
> > print data
> An all Python solution is a little bit more complicated:
> import htmllib, formatter
> p =
> f = open('test.html')
> If you want a writer who knows how to write lists (<ol>), look for a
> called LessDumbWriter posted last friday (by me).
> ben . de . rydt at pandora . be ------------------ your comments
> http://users.pandora.be/bdr/ ------- inl. IPv6, Linux en Pandora
I;ve also worked on a module that handles form fields, textareas, check
boxes, option groups, <select>, <select MULTIPLE> etc... I've called it
html2txt.py. It also handles tables, in a decent readable mannor. Maybe the
LessDumbWriter could add <ol> to html2txt.py
If anyone is interested email me at kputland at servicemagic.com and I can send
More information about the Python-list