how to strip the domain name in python?

Alex Martelli aleax at mac.com
Sat Apr 14 06:58:13 CEST 2007


<Marko.Cain.23 at gmail.com> wrote:

> Hi,
> 
> I have a list of url names like this, and I am trying to strip out the
> domain name using the following code:
> 
> http://www.cnn.com
> www.yahoo.com
> http://www.ebay.co.uk
> 
> pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
> match = re.findall(pattern, line)
> 
> if (match):
>         s1, s2 = match[0]
> 
>         print s2
> 
> but none of the site matched, can you please tell me what am i
> missing?

You're using reverse slashes in your RE pattern, to start with, while
the URLs contain plain slashes (or don't have any slashes, in the case
of the second one).

Anyway, forget REs, and use standard library module urlparse,
specifically its urlparse.urlsplit function.


Alex



More information about the Python-list mailing list