Extracting real-domain-name (without sub-domains) from a given URL
s.selvamsiva at gmail.com
Tue Jan 13 09:28:36 CET 2009
On Tue, Jan 13, 2009 at 1:50 PM, Chris Rebert <clp2 at rebertia.com> wrote:
> On Mon, Jan 12, 2009 at 11:46 PM, S.Selvam Siva <s.selvamsiva at gmail.com> wrote:
> > Hi all,
> > I need to extract the domain-name from a given url(without sub-domains).
> > With urlparse, i am able to fetch only the domain-name(which includes the
> > sub-domain also).
> > eg:
> > http://feeds.huffingtonpost.com/posts/ , http://www.huffingtonpost.de/,
> > .... all must lead to huffingtonpost.com or huffingtonpost.de
> > Please suggest me some ideas regarding this problem.
> That would require (pardon the pun) domain-specific logic. For most
> TLDs (e.g. .com, .org) the domain name is just blah.com, blah.org,
> etc. But for ccTLDs, often only second-level registrations are
> allowed, e.g. for www.bbc.co.uk, so the main domain name would be
> bbc.co.uk I think a few TLDs have even more complicated rules.
> I doubt anyone's created a general ready-made solution for this, you'd
> have to code it yourself.
> To handle the common case, you can cheat and just .split() at the
> periods and then slice and rejoin the list of domain parts, ex:
Thank you Chris Rebert,
Actually i tried with domain specific logic.Having 200 TLD like
.com,co.in,co.uk and tried to extract the domain name.
But my boss want more reliable solution than this method,any way i
will try to find some alternative solution.
More information about the Python-list