Html: replacing tags
Grzegorz Adam Hankiewicz
gradha at titanium.sabren.com
Fri Jun 13 20:58:04 CEST 2003
On 2003-06-08, Andrei <see at my.signature.com> wrote:
> I'm working on an RSS aggregator and I'd like to replace all
> img-tags in a piece of html with links to the image, thereby
> using the alt-text of the img as link text (if present). The
> rest of the html, including tags, should stay as-is. I'm capable
> of doing this in what feels like the dumb way (parsing it with
> regexes for example, or plain old string splitting and rejoining),
> but I have this impression the HTMLParser or htmllib module should
> be able to help me with this task.
> However, I can't figure out how (if?) I can make a parser do this.
Yes, HTMLParser only parses, but you do this subclassing, and you can
override behaviour. What I do is to subclass HTMLParser and subclass
all methods to add their parameters nearly as is to a list of the
class object. Then, when the parsing has finished you can retrieve
this list and join in to get a string with the original HTML.
Of course, inside the handle_start|end|tag you can test the tag
being parsed and insert it as is or subsitute it with something else.
Please don't send me private copies of your public answers. Thanks.
More information about the Python-list