question on HTMLParser and parser.feed()
Peter Otten
__peter__ at web.de
Sat Dec 6 04:00:53 EST 2003
Stephen Briley wrote:
> I am satisfied with the HTMLparse of my htmlsource
> page. But I am unable to save the output of
My guess is that in the long run you will be even more satisfied with the
HTMLParser in the HTMLParser module - it has a cleaner interface and can
handle XHTML.
> parser.feed(htmlsource). When I type
> parser.feed(htmlsource) into the interpreter, the
> correct output streams across the screen. But all of
> my attempts to capture this output to a variable are
> unsucessful (e.g. capt_text =
> parser.feed(htmlsource)).
>
> What am I missing and how can I get this to work?
> Thanks in advance!
>
>
> from htmllib import HTMLParser
> from formatter import AbstractFormatter, DumbWriter
> parser = HTMLParser(AbstractFormatter(DumbWriter()))
> parser.feed(htmlsource)
You can provide a file object to the dumbwriter object to write the
formatter output to a file:
outstream = file("tmp.txt", "w")
parser = HTMLParser(AbstractFormatter(DumbWriter(outstream)))
parser.feed(htmlsource)
outstream.close()
When you don't want to store the output you can instead provide a StringIO
instance that behaves like a file, but does not store anything on disk:
# cStringIO contains the faster version of StringIO
from cStringIO import StringIO
from htmllib import HTMLParser
from formatter import AbstractFormatter, DumbWriter
htmlsource = """
<html>
<head><title>Hello world</title></head>
<body>For demonstration purposes</body>
</html>
"""
outstream = StringIO()
parser = HTMLParser(AbstractFormatter(DumbWriter(outstream)))
parser.feed(htmlsource)
data = outstream.getvalue()
outstream.close()
# your code here, I just print it in uppercase
print data.upper()
Peter
More information about the Python-list
mailing list