Html: replacing tags

Andrei see at
Sun Jun 8 18:29:31 CEST 2003

Originally posted by Fredrik Lundh 
> Andrei  wrote:
> > I'm working on an RSS aggregator and I'd like to replace all
>     img-tags in
> > a piece of html with links to the image, thereby using the
>     alt-text of
> > the img as link text (if present). The rest of the html,
>     including tags,
> > should stay as-is. I'm capable of doing this in what feels like
>     the dumb
> > way (parsing it with regexes for example, or plain old string
>     splitting
> > and rejoining), but I have this impression the HTMLParser or
>     htmllib
> > module should be able to help me with this task.
> > However, I can't figure out how (if?) I can make a parser do
>     this. Does
> > the formatter module fit in here somewhere? The docs, the
>     effbot's guide
> > and the posts regarding html only seem to highlight getting data
>     out of
> > the html (retrieving links seems particularly popular), not
>     replacing
> > tags with other ones.
> the term "parser" usually refers to a piece of software that reads a
> character stream, and turns it into some other data structure.
> if you want to modify a character stream, you have to combine the
> parser with code that turns that data structure back to a character
> stream.
> the "Using the sgmllib Module to Filter SGML Documents" example in
> chapter 5 of my "Python Standard Library" book does exactly that:
>[/url] (pdf)
> (you can use a similar approach with HTMLParser, but htmllib is
> designed for HTML formatting, not HTML parsing, and is not the
> right tool for the task) 

Thanks, I'll look into sgmllib then (I already have that chapter on my
HD :), but it didn't really occur to me to look at sgml).

Contact info (decode with rot13): cebwrpg5 at bcrenznvy.pbz
Fcnzserr! Cyrnfr qb abg hfr va choyvp zrffntrf. V ernq gur yvfg, ab arrq gb PP.

Posted via

More information about the Python-list mailing list