Html: replacing tags
Andrei
see at my.signature.com
Sun Jun 8 12:29:31 EDT 2003
Originally posted by Fredrik Lundh
> Andrei wrote:
>
> > I'm working on an RSS aggregator and I'd like to replace all
> img-tags in
> > a piece of html with links to the image, thereby using the
> alt-text of
> > the img as link text (if present). The rest of the html,
> including tags,
> > should stay as-is. I'm capable of doing this in what feels like
> the dumb
> > way (parsing it with regexes for example, or plain old string
> splitting
> > and rejoining), but I have this impression the HTMLParser or
> htmllib
> > module should be able to help me with this task.
> > However, I can't figure out how (if?) I can make a parser do
> this. Does
> > the formatter module fit in here somewhere? The docs, the
> effbot's guide
> > and the posts regarding html only seem to highlight getting data
> out of
> > the html (retrieving links seems particularly popular), not
> replacing
> > tags with other ones.
>
> the term "parser" usually refers to a piece of software that reads a
> character stream, and turns it into some other data structure.
>
> if you want to modify a character stream, you have to combine the
> parser with code that turns that data structure back to a character
> stream.
>
> the "Using the sgmllib Module to Filter SGML Documents" example in
> chapter 5 of my "Python Standard Library" book does exactly that:
>
> http://www.oreilly.com/catalog/pythonsl/chapter/ch05.html"]-
> http://www.oreilly.com/catalog/pythonsl/chapter/ch05.html[/url]
> http://www.effbot.org/zone/librarybook-index.htm"]http://ww-
> w.effbot.org/zone/librarybook-index.htm[/url] (pdf)
>
> (you can use a similar approach with HTMLParser, but htmllib is
> designed for HTML formatting, not HTML parsing, and is not the
> right tool for the task)
Thanks, I'll look into sgmllib then (I already have that chapter on my
HD :), but it didn't really occur to me to look at sgml).
--
Contact info (decode with rot13): cebwrpg5 at bcrenznvy.pbz
Fcnzserr! Cyrnfr qb abg hfr va choyvp zrffntrf. V ernq gur yvfg, ab arrq gb PP.
Posted via http://dbforums.com
More information about the Python-list
mailing list