Parsing html links

Fredrik Lundh fredrik at effbot.org
Wed Jan 24 06:02:21 EST 2001


"Joonas" wrote:
> How can I make a script im Python that changes all
> 'http://myaddress.spam ' into '<a
> href="http://myaddress.spam">http://myaddress.spam<a/>'

recently seen on comp.lang.python:

Subject: Re: URL replacement in text
Date: Tue, 16 Jan 2001

Steve Holden wrote:
> Note that this won't cope with some of the more pathological URLs (such as
> those with CGI arguments [http://system/cgi?arg1=val1] or those which link
> to a named anchor in the target page [http://system/pageref#target-name]).

import re

links = re.compile(r"(?i)(?:http|ftp|gopher):[\w.~/%#?&=-]+")

def fixlink(m):
    href = m.group(0)
    return "<a href='%s'>%s</a>" % (href, href)

sometext = """
here's another link: http://www.pythonware.com?FOO=1&bar=10#BAR
"""

print links.sub(fixlink, sometext)

Cheers /F





More information about the Python-list mailing list