Html: replacing tags

Grzegorz Adam Hankiewicz gradha at titanium.sabren.com
Fri Jun 13 14:58:04 EDT 2003


On 2003-06-08, Andrei <see at my.signature.com> wrote:
> I'm working on an RSS aggregator and I'd like to replace all
> img-tags in a piece of html with links to the image, thereby
> using the alt-text of the img as link text (if present). The
> rest of the html, including tags, should stay as-is. I'm capable
> of doing this in what feels like the dumb way (parsing it with
> regexes for example, or plain old string splitting and rejoining),
> but I have this impression the HTMLParser or htmllib module should
> be able to help me with this task.
> 
> However, I can't figure out how (if?) I can make a parser do this.

Yes, HTMLParser only parses, but you do this subclassing, and you can
override behaviour.  What I do is to subclass HTMLParser and subclass
all methods to add their parameters nearly as is to a list of the
class object. Then, when the parsing has finished you can retrieve
this list and join in to get a string with the original HTML.

Of course, inside the handle_start|end|tag you can test the tag
being parsed and insert it as is or subsitute it with something else.

-- 
 Please don't send me private copies of your public answers. Thanks.





More information about the Python-list mailing list