Extracting real-domain-name (without sub-domains) from a given URL

S.Selvam Siva s.selvamsiva at gmail.com
Tue Jan 13 09:28:36 CET 2009


On Tue, Jan 13, 2009 at 1:50 PM, Chris Rebert <clp2 at rebertia.com> wrote:
>
> On Mon, Jan 12, 2009 at 11:46 PM, S.Selvam Siva <s.selvamsiva at gmail.com> wrote:
> > Hi all,
> >
> >   I need to extract the domain-name from a given url(without sub-domains).
> > With urlparse, i am able to fetch only the domain-name(which includes the
> > sub-domain also).
> >
> > eg:
> >   http://feeds.huffingtonpost.com/posts/ , http://www.huffingtonpost.de/,
> > .... all must lead to huffingtonpost.com or huffingtonpost.de
> >
> > Please suggest me some ideas regarding this problem.
>
> That would require (pardon the pun) domain-specific logic. For most
> TLDs (e.g. .com, .org) the domain name is just blah.com, blah.org,
> etc. But for ccTLDs, often only second-level registrations are
> allowed, e.g. for www.bbc.co.uk, so the main domain name would be
> bbc.co.uk  I think a few TLDs have even more complicated rules.
>
> I doubt anyone's created a general ready-made solution for this, you'd
> have to code it yourself.
> To handle the common case, you can cheat and just .split() at the
> periods and then slice and rejoin the list of domain parts, ex:
> '.'.join(domain.split('.')[-2:])
>
> Cheers,
> Chris


Thank you Chris Rebert,
  Actually i tried with domain specific logic.Having 200 TLD like
.com,co.in,co.uk and tried to extract the domain name.
  But my boss want more reliable solution than this method,any way i
will try to find some alternative solution.


--
Yours,
S.Selvam



More information about the Python-list mailing list