Html: replacing tags
Walter Dörwald
walter at livinglogic.de
Tue Jun 10 15:48:21 EDT 2003
Andrei wrote:
> Hello,
>
> I'm working on an RSS aggregator and I'd like to replace all img-tags in
> a piece of html with links to the image, thereby using the alt-text of
> the img as link text (if present). The rest of the html, including tags,
> should stay as-is. I'm capable of doing this in what feels like the dumb
> way (parsing it with regexes for example, or plain old string splitting
> and rejoining), but I have this impression the HTMLParser or htmllib
> module should be able to help me with this task.
>
> However, I can't figure out how (if?) I can make a parser do this. Does
> the formatter module fit in here somewhere? The docs, the effbot's guide
> and the posts regarding html only seem to highlight getting data out of
> the html (retrieving links seems particularly popular), not replacing
> tags with other ones.
You might want to use XIST for that. Code that does what you want with
with the Python homepage could look like this:
---
from ll.xist import parsers, converters
from ll.xist.ns import html
def fiximg(node):
if isinstance(node, html.img):
node = html.a(node["alt"], href=node["src"])
return node
e = parsers.parseURL("http://www.python.org", tidy=True)
e = e.mapped(fiximg)
print e.asBytes()
---
HTH,
Walter Dörwald
More information about the Python-list
mailing list