Use Regular Expressions to extract URL's

Jimbo nilly16 at yahoo.com
Fri Apr 30 08:53:06 CEST 2010


Hello

I am using regular expressions to grab URL's from a string(of HTML
code). I am getting on very well & I seem to be grabbing the full URL
[b]but[/b]
I also get a '"' character at the end of it. Do you know how I can get
rid of the '"' char at the end of my URL

[b]Example of problem:[/b]
[quote]
I get this when I extract a url from a string
http://google.com"

I want to get this
http://google.com
[/quote]

My regular expression:
[code]
def find_urls(string):
    """ Extract all URL's from a string & return as a list """

    url_list = re.findall(r'(?:http://|www.).*?["]',string)
    return url_list
[/code]



More information about the Python-list mailing list