Html: replacing tags

Walter Dörwald walter at livinglogic.de
Tue Jun 10 15:48:21 EDT 2003


Andrei wrote:

> Hello,
> 
> I'm working on an RSS aggregator and I'd like to replace all img-tags in
> a piece of html with links to the image, thereby using the alt-text of
> the img as link text (if present). The rest of the html, including tags,
> should stay as-is. I'm capable of doing this in what feels like the dumb
> way (parsing it with regexes for example, or plain old string splitting
> and rejoining), but I have this impression the HTMLParser or htmllib
> module should be able to help me with this task.
> 
> However, I can't figure out how (if?) I can make a parser do this. Does
> the formatter module fit in here somewhere? The docs, the effbot's guide
> and the posts regarding html only seem to highlight getting data out of
> the html (retrieving links seems particularly popular), not replacing
> tags with other ones.

You might want to use XIST for that. Code that does what you want with
with the Python homepage could look like this:

---
from ll.xist import parsers, converters
from ll.xist.ns import html

def fiximg(node):
    if isinstance(node, html.img):
       node = html.a(node["alt"], href=node["src"])
    return node

e = parsers.parseURL("http://www.python.org", tidy=True)

e = e.mapped(fiximg)

print e.asBytes()
---

HTH,
    Walter Dörwald





More information about the Python-list mailing list