Regex for URL extracting
Chris Mellon
arkanes at gmail.com
Wed Jan 24 15:05:31 EST 2007
On 24 Jan 2007 11:07:49 -0800, Paul McGuire <ptmcg at austin.rr.com> wrote:
> On Jan 24, 10:20 am, "Johny" <pyt... at hope.cz> wrote:
> > Does anyone know about a good regular expression for URL extracting?
> >
> > J.
> Google turns this up:
>
> http://geekswithblogs.net/casualjim/archive/2005/12/01/61722.aspx
>
> But I've seen other re's for this problem that are hundreds of
> characters long.
>
> -- Paul
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
These are the regexps that gnome-terminal uses for it's URL
auto-recognition, and I have shamelessly stolen them for use in one of
my own apps:
urlfinders = [
re.compile("([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}|(((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\\.)[-A-Za-z0-9\\.]+)(:[0-9]*)?/[-A-Za-z0-9_\\$\\.\\+\\!\\*\\(\\),;:@&=\\?/~\\#\\%]*[^]'\\.}>\\),\\\"]"),
re.compile("([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}|(((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\\.)[-A-Za-z0-9\\.]+)(:[0-9]*)?"),
re.compile("(~/|/|\\./)([-A-Za-z0-9_\\$\\.\\+\\!\\*\\(\\),;:@&=\\?/~\\#\\%]|\\\\
)+"),
re.compile("'\\<((mailto:)|)[-A-Za-z0-9\\.]+@[-A-Za-z0-9\\.]+"),
]
More information about the Python-list
mailing list