Extracting real-domain-name (without sub-domains) from a given URL

Steve Holden steve at holdenweb.com
Tue Jan 13 06:43:37 EST 2009


S.Selvam Siva wrote:
> On Tue, Jan 13, 2009 at 1:50 PM, Chris Rebert <clp2 at rebertia.com> wrote:
>> On Mon, Jan 12, 2009 at 11:46 PM, S.Selvam Siva <s.selvamsiva at gmail.com> wrote:
>>> Hi all,
>>>
>>>   I need to extract the domain-name from a given url(without sub-domains).
>>> With urlparse, i am able to fetch only the domain-name(which includes the
>>> sub-domain also).
>>>
>>> eg:
>>>   http://feeds.huffingtonpost.com/posts/ , http://www.huffingtonpost.de/,
>>> .... all must lead to huffingtonpost.com or huffingtonpost.de
>>>
>>> Please suggest me some ideas regarding this problem.
>> That would require (pardon the pun) domain-specific logic. For most
>> TLDs (e.g. .com, .org) the domain name is just blah.com, blah.org,
>> etc. But for ccTLDs, often only second-level registrations are
>> allowed, e.g. for www.bbc.co.uk, so the main domain name would be
>> bbc.co.uk  I think a few TLDs have even more complicated rules.
>>
>> I doubt anyone's created a general ready-made solution for this, you'd
>> have to code it yourself.
>> To handle the common case, you can cheat and just .split() at the
>> periods and then slice and rejoin the list of domain parts, ex:
>> '.'.join(domain.split('.')[-2:])
>>
>> Cheers,
>> Chris
> 
> 
> Thank you Chris Rebert,
>   Actually i tried with domain specific logic.Having 200 TLD like
> .com,co.in,co.uk and tried to extract the domain name.
>   But my boss want more reliable solution than this method,any way i
> will try to find some alternative solution.
> 
If you post a good first try, opening the source, I would be surprised
if others do not join your effort to establish suitable rules. This is
somethjing that many people could doubtless use.

regards
 Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/




More information about the Python-list mailing list