Buffering HTML as HTMLParser reads it?
Paul McGuire
ptmcg at austin.rr.com
Wed Aug 1 16:08:08 EDT 2007
On Aug 1, 1:31 pm, chris... at gmail.com wrote:
<snip>
>
> I'm thinking maybe somehow have HTMLParser append each character it
> reads except for data inside tags in some kind of buffer? This way I
> can have the HTML contents read into a buffer, then when I do my own
> handle_ overrides, I can also append to that buffer with the
> transformed data. Once the HTML page is finished parsing, ideally I
> would be able to print the contents of the buffer and the HTML would
> be identical except for the string transformations.
>
> I also need to make sure that all newlines, tags, spacing, etc are
> kept in tact -- this part is a requirement for other reasons.
>
> Thanks!
What you describe is almost exactly how pyparsing implements
transformString. See below:
from pyparsing import *
boldStart,boldEnd = makeHTMLTags("B")
# convert <B> to <div class="bold"> and </B> to </div>
boldStart.setParseAction(replaceWith('<div class="emphatic">'))
boldEnd.setParseAction(replaceWith('</div>'))
converter = boldStart | boldEnd
html = "Display this in <b>bold</b>"
print converter.transformString(html)
Prints:
Display this in <div class="emphatic">bold</div>
All text not matched by a pattern in the converter is left as-is. (My
CSS style/form may not be up to date, but I hope you get the idea.)
-- Paul
More information about the Python-list
mailing list