[Tutor] htmllib, formatter and results as a string [StringIO]

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Thu, 4 Oct 2001 00:06:24 -0700 (PDT)


On Wed, 3 Oct 2001, Scott Griffitts wrote:

> I'm trying to take a html file and format it in plain text (with some other
> formatting thrown in - see below).  The code below gets me pretty close:
> 
> import htmllib, formatter
> 
> w = formatter.DumbWriter()

Hmmm.. I believe that the DumbWriter always prints out to "standard
output".  Let me check.

    http://www.python.org/doc/lib/writer-impls.html

Ah!  According to the documentation, we can give DumbWriter() a file to
write its output to.  By default, it will print to stdout, but we can tell
it to write somewhere else.

There's a very nice class called StringIO() that makes a string look very
much like a file:

    http://www.python.org/doc/lib/module-StringIO.html

so we can tie all this together as:

###
import StringIO, htmllib, formatter
my_stringio = StringIO.StringIO()     ## make an instance of this
                                      ## file-like string thing
w = formatter.DumbWriter(my_stringio)
f = formatter.AbstractFormatter(w)
file = open('C://test//test.html')    ## not sure if you have to double
                                      ## up forward slashes... but it
                                      ## doesn't hurt according to
                                      ## POSIX standards.
p = htmllib.HTMLParser(f)
p.feed(file.read())                   ## Everything else stays the
p.close()                             ## same... but...
file.close()

content = my_stringio.getvalue()      ## Here's the interesting bit!
###


> But I want to do some additional formatting.  The p.feed(file.read())
> part of the code is (to my newbie understanding) sending the result to
> stdout.  I would like to catch it as a string so I can perform some
> further tweaking. How do I do that?

If we hadn't been able to give that StringIO file object to the
DumbWriter, we can still do somethng: we can redirect standard output to a
file of our choice, like this:

###
import sys, StringIO
my_stringio = StringIO.StringIO()
sys.stdout = my_stringio      ## redirect stdout to be our string file

print "Hello world, do you see this?"

sys.stdout = sys.__stdout__   ## set it back to the original stdout
print "Here's what we captured from stdout:", my_stringio.getvalue()
###

This is a bit hacky, but it is sometimes necessary.  None of Python's
standard library should force us to use this hack... so if you find
something that does, tell us about it, and we'll complain.  *grin*

Either way, StringIO is a lot of fun to play with.  Hope this helps!