Extracting real-domain-name (without sub-domains) from a given URL

Chris Rebert clp2 at rebertia.com
Tue Jan 13 03:20:41 EST 2009


On Mon, Jan 12, 2009 at 11:46 PM, S.Selvam Siva <s.selvamsiva at gmail.com> wrote:
> Hi all,
>
>   I need to extract the domain-name from a given url(without sub-domains).
> With urlparse, i am able to fetch only the domain-name(which includes the
> sub-domain also).
>
> eg:
>   http://feeds.huffingtonpost.com/posts/ , http://www.huffingtonpost.de/,
> .... all must lead to huffingtonpost.com or huffingtonpost.de
>
> Please suggest me some ideas regarding this problem.

That would require (pardon the pun) domain-specific logic. For most
TLDs (e.g. .com, .org) the domain name is just blah.com, blah.org,
etc. But for ccTLDs, often only second-level registrations are
allowed, e.g. for www.bbc.co.uk, so the main domain name would be
bbc.co.uk  I think a few TLDs have even more complicated rules.

I doubt anyone's created a general ready-made solution for this, you'd
have to code it yourself.
To handle the common case, you can cheat and just .split() at the
periods and then slice and rejoin the list of domain parts, ex:
'.'.join(domain.split('.')[-2:])

Cheers,
Chris

-- 
Follow the path of the Iguana...
http://rebertia.com



More information about the Python-list mailing list