how to strip the domain name in python?
Michael Bentley
michael at jedimindworks.com
Sun Apr 15 20:57:36 EDT 2007
On Apr 15, 2007, at 4:24 PM, Marko.Cain.23 at gmail.com wrote:
> On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
>> In <1176654669.737355.78... at y5g2000hsa.googlegroups.com>,
>> Marko.Cain.23
>> wrote:
>>
>>
>>
>>> On Apr 14, 10:36 am, Marko.Cain... at gmail.com wrote:
>>>> On Apr 14, 12:02 am, Michael Bentley <mich... at jedimindworks.com>
>>>> wrote:
>>
>>>>> On Apr 13, 2007, at 11:49 PM, Marko.Cain... at gmail.com wrote:
>>
>>>>>> Hi,
>>
>>>>>> I have a list of url names like this, and I am trying to strip
>>>>>> out the
>>>>>> domain name using the following code:
>>
>>>>>> http://www.cnn.com
>>>>>> www.yahoo.com
>>>>>> http://www.ebay.co.uk
>>
>>>>>> pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
>>>>>> match = re.findall(pattern, line)
>>
>>>>>> if (match):
>>>>>> s1, s2 = match[0]
>>
>>>>>> print s2
>>
>>>>>> but none of the site matched, can you please tell me what am i
>>>>>> missing?
>>
>>>>> change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile
>>>>> ("http:\/
>>>>> \/(.*)\.(.*)", re.S)
>>
>>>> Thanks. I try this:
>>
>>>> but when the 'line' ishttp://www.cnn.com, I get 's2' com,
>>>> but i want 'cnn.com' (everything after the first '.'), how can I do
>>>> that?
>>
>>>> pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)
>>
>>>> match = re.findall(pattern, line)
>>
>>>> if (match):
>>
>>>> s1, s2 = match[0]
>>
>>>> print s2
>>
>>> Can anyone please help me with my problem? I still can't solve it.
>>
>>> Basically, I want to strip out the text after the first '.' in url
>>> address:
>>
>>> http://www.cnn.com-> cnn.com
>>
>> from urlparse import urlsplit
>>
>> def get_domain(url):
>> net_location = urlsplit(url)[1]
>> return '.'.join(net_location.rsplit('.', 2)[-2:])
>>
>> def main():
>> print get_domain('http://www.cnn.com')
>>
>> Ciao,
>> Marc 'BlackJack' Rintsch
>
> Thanks for your help.
>
> But if the input string is "http://www.ebay.co.uk/", I only get
> "co.uk"
>
> how can I change it so that it works for both www.ebay.co.uk and
> www.cnn.com?
>
from urlparse import urlsplit
def get_domain(url):
net_location = (
urlsplit(url)[1]
and urlsplit(url)[1].split('.')
or urlsplit(url)[2].split('.')
) # tricksy way to get long line into email
if net_location[0].lower() == 'www':
net_location = net_location[1:]
return '.'.join(net_location)
def main():
testItems = ['http://www.cnn.com',
'www.yahoo.com',
'http://www.ebay.co.uk']
for testItem in testItems:
print get_domain(testItem)
if __name__ == '__main__':
main()
More information about the Python-list
mailing list