Newbie here... getting a count of repeated instances in a list.
Amy G
amy-g-art at cox.net
Sun Nov 23 23:40:06 EST 2003
Thanks again for that code help. I was able to follow your comments to
understand what you were doing. However, I was wondering how I can print
out the number of instances from the dictionary.
I might like to know not only that they are over the threshold, but what
their actual count is.
Thanks,
AMY
"Amy G" <amy-g-art at cox.net> wrote in message
news:keyvb.5217$9O5.2011 at fed1read06...
> I started trying to learn python today. The program I am trying to write
> will open a text file containing email addresses and store them in a list.
> Then it will go through them saving only the domain portion of the email.
> After that it will count the number of times the domain occurs, and if
above
> a certain threshhold, it will add that domain to a list or text file, or
> whatever. For now I just have it printing to the screen.
>
> This is my code, and it works and does what I want. But I want to do
> something with hash object to make this go a whole lot faster. Any
> suggestions are appreciated a great deal.
>
> Thanks,
> Amy
>
> ps. Sorry about the long post. Just really need some help here.
>
>
> CODE
> ************************
> file = open(sys.argv[1], 'r') # Opens up file containing
emails
> mail_list = file.readlines() # and sets the contents into a
> list
>
> def get_domains(email_list): # This function takes list of
emails
> and returns the domains only
> domain_list = email_list
> line_count = 0
> while line_count < len(email_list):
> domain_list[line_count] =
> email_list[line_count].split('@', 1)[1]
> domain_list[line_count] =
> email_list[line_count].strip()
> return domain_list
>
> def count_domains(domain_list): # Takes argument of a list of domains
and
> returns a list of domains that
> counted_domains = 0 # occur more than <threshhold>
number
> of times
> line_count = 0
> domain_count = 0
> threshhold = 10
> while line_count < len(domain_list):
> domain_count =
> domain_list.count(domain_list[line_count])
> if domain_count > threshhold:
> r = 0
> counted_domains.append(d)
> while r < (domain_count -1):
> # Remove all other instances of an email once counted
> domain_list.remove(d)
> r = r + 1
> line_count = line_count + 1
> return counted_domains
>
>
> domains = get_domains(mail_list)
> counted = count_domains(domains)
> print counted
>
> ********************************************
>
>
More information about the Python-list
mailing list