Use Regular Expressions to extract URL's
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Fri Apr 30 05:03:30 EDT 2010
On Thu, 29 Apr 2010 23:53:06 -0700, Jimbo wrote:
> Hello
>
> I am using regular expressions to grab URL's from a string(of HTML
> code). I am getting on very well & I seem to be grabbing the full URL
> [b]but[/b]
> I also get a '"' character at the end of it. Do you know how I can get
> rid of the '"' char at the end of my URL
Live dangerously and just drop the last character from string s no matter
what it is:
s = s[:-1]
Or be a little more cautious and test first:
if s.endswith('"'):
s = s[:-1]
Or fix the problem at the source. Using regexes to parse HTML is always
problematic. You should consider using a proper HTML parser. Otherwise,
try this regex:
r'"(http://(?:www)?\..*?)"'
--
Steven
More information about the Python-list
mailing list