Python equivalent of "lynx -dump"?
Fredrik Lundh
effbot at telia.com
Wed Mar 29 19:21:49 EST 2000
lewst <lewst at yahoo.com> wrote:
> > An all Python solution is a little bit more complicated:
> >
> > import htmllib, formatter
> >
> > p =
htmllib.HTMLParser(formatter.AbstractFormatter(formatter.DumbWriter()))
> > f = open('test.html')
> > p.feed(f.read())
> > p.close()
> > f.close()
>
> Yes, but how can I store the output of "p.feed(f.read())" in a
> variable such as `data' like I'm doing above with lynxcmd. Your
> code writes everything out to the terminal.
did you read the fine manual?
http://www.python.org/doc/current/lib/writer-impls.html
DumbWriter ([file[, maxcol = 72]])
Simple writer class which writes output on the file object
passed in as file or, if file is omitted, on standard output.
in your case, using a StringIO file object is probably the best
solution:
import StringIO
file = StringIO.StringIO()
# build formatting pipeline
w = formatter.DumbWriter(file)
f = formatter.AbstractFormatter(w)
p = htmllib.HTMLParser(f)
...
data = file.getvalue()
</F>
<!-- (the eff-bot guide to) the standard python library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->
More information about the Python-list
mailing list