Sort by domain name?
Paul Rubin
http
Mon Oct 2 12:06:00 EDT 2006
"js " <ebgssth at gmail.com> writes:
> All I want to do is to sort out a list of url by companyname,
> like oreilly, ask, skype, amazon, google and so on, to find out
> how many company's url the list contain.
Here's a function I used to use. It makes no attempt to be
exhaustive, but did a reasonable job on the domains I cared about at
the time:
def host_domain(hostname):
parts = hostname.split('.')
if parts[-1] in ('au','uk','nz', 'za', 'jp', 'br'):
# www.foobar.co.uk, etc
host_len = 3
elif len(parts)==4 and re.match('^[\d.]+$', hostname):
host_len = 4 # 2.3.4.5 numeric address
else:
host_len = 2
d = '.'.join(parts[-(host_len):])
# print 'host_domain:', hostname, '=>', d
return d
More information about the Python-list
mailing list