Newbie here... getting a count of repeated instances in a list.
Amy G
amy-g-art at cox.net
Fri Nov 21 19:38:43 EST 2003
I started trying to learn python today. The program I am trying to write
will open a text file containing email addresses and store them in a list.
Then it will go through them saving only the domain portion of the email.
After that it will count the number of times the domain occurs, and if above
a certain threshhold, it will add that domain to a list or text file, or
whatever. For now I just have it printing to the screen.
This is my code, and it works and does what I want. But I want to do
something with hash object to make this go a whole lot faster. Any
suggestions are appreciated a great deal.
Thanks,
Amy
ps. Sorry about the long post. Just really need some help here.
CODE
************************
file = open(sys.argv[1], 'r') # Opens up file containing emails
mail_list = file.readlines() # and sets the contents into a
list
def get_domains(email_list): # This function takes list of emails
and returns the domains only
domain_list = email_list
line_count = 0
while line_count < len(email_list):
domain_list[line_count] =
email_list[line_count].split('@', 1)[1]
domain_list[line_count] =
email_list[line_count].strip()
return domain_list
def count_domains(domain_list): # Takes argument of a list of domains and
returns a list of domains that
counted_domains = 0 # occur more than <threshhold> number
of times
line_count = 0
domain_count = 0
threshhold = 10
while line_count < len(domain_list):
domain_count =
domain_list.count(domain_list[line_count])
if domain_count > threshhold:
r = 0
counted_domains.append(d)
while r < (domain_count -1):
# Remove all other instances of an email once counted
domain_list.remove(d)
r = r + 1
line_count = line_count + 1
return counted_domains
domains = get_domains(mail_list)
counted = count_domains(domains)
print counted
********************************************
More information about the Python-list
mailing list