Python equivalent of "lynx -dump"?

Karl Putland kputland at servicemagic.com
Thu Mar 30 23:27:08 CEST 2000


<ben at co.and.co> wrote in message
news:CDQD4.44133$ds6.91068 at afrodite.telenet-ops.be...
> lewst <lewst at yahoo.com> wrote:
> > I'm looking for a functional equivalent of the "-dump" option to the
> > lynx web-browser in Python.  "-dump" dumps the formatted output of an
> > HTML document.
>
> > Right now I have a python program that captures the output of a
> > webpage and prints it like so:
>
> >         lynxcmd = "lynx -dump %s" %url
> >         data = os.popen(lynxcmd).read()
> >         print data
>
> An all Python solution is a little bit more complicated:
>
> import htmllib, formatter
>
> p =
htmllib.HTMLParser(formatter.AbstractFormatter(formatter.DumbWriter()))
> f = open('test.html')
> p.feed(f.read())
> p.close()
> f.close()
>
> If you want a writer who knows how to write lists (<ol>), look for a
message
> called LessDumbWriter posted last friday (by me).
>
> Greeting,
> --
> ben . de . rydt at pandora . be ------------------ your comments
> http://users.pandora.be/bdr/ ------- inl. IPv6, Linux en Pandora
>


I;ve also worked on a module that handles form fields, textareas, check
boxes, option groups, <select>, <select MULTIPLE> etc...  I've called it
html2txt.py. It also handles tables, in a decent readable mannor.  Maybe the
LessDumbWriter could add <ol> to html2txt.py

If anyone is interested email me at kputland at servicemagic.com and I can send
a copy.

--Karl Putland





More information about the Python-list mailing list