changing URLs in webpages, python solutions?

Stefan Behnel stefan_ml at behnel.de
Sun Jan 18 11:40:22 EST 2009


Simon Forman wrote:
> I want to take a webpage, find all URLs (links, img src, etc.) and
> rewrite them in-place, and I'd like to do it in python (pure python
> preferred.)

lxml.html has functions specifically for this problem.

http://codespeak.net/lxml/lxmlhtml.html#working-with-links

Code would be something like

	html_doc = lxml.html.parse(b"http://.../xyz.html")
	html_doc.rewrite_links( ... )
	print( lxml.html.tostring(html_doc) )

It also handles links in CSS or JavaScript, as well as broken HTML documents.

Stefan



More information about the Python-list mailing list