Newbie here... getting a count of repeated instances in a list.

Amy G amy-g-art at cox.net
Sat Nov 22 19:16:06 CET 2003


Thanks for that, I will try come Monday.  Hopefully it will work.

AMY


"Amy G" <amy-g-art at cox.net> wrote in message
news:keyvb.5217$9O5.2011 at fed1read06...
> I started trying to learn python today.  The program I am trying to write
> will open a text file containing email addresses and store them in a list.
> Then it will go through them saving only the domain portion of the email.
> After that it will count the number of times the domain occurs, and if
above
> a certain threshhold, it will add that domain to a list or text file, or
> whatever.  For now I just have it printing to the screen.
>
> This is my code, and it works and does what I want.  But I want to do
> something with hash object to make this go a whole lot faster.  Any
> suggestions are appreciated a great deal.
>
> Thanks,
> Amy
>
> ps.  Sorry about the long post.  Just really need some help here.
>
>
> CODE
> ************************
> file = open(sys.argv[1], 'r')             # Opens up file containing
emails
> mail_list = file.readlines()                # and sets the contents into a
> list
>
> def get_domains(email_list):            # This function takes list of
emails
> and returns the domains only
>             domain_list = email_list
>             line_count = 0
>             while line_count < len(email_list):
>                         domain_list[line_count] =
> email_list[line_count].split('@', 1)[1]
>                         domain_list[line_count] =
> email_list[line_count].strip()
>             return domain_list
>
> def count_domains(domain_list):    # Takes argument of a list of domains
and
> returns a list of domains that
>             counted_domains = 0        # occur more than <threshhold>
number
> of times
>             line_count = 0
>             domain_count = 0
>             threshhold = 10
>             while line_count < len(domain_list):
>                         domain_count =
> domain_list.count(domain_list[line_count])
>                         if domain_count > threshhold:
>                                     r = 0
>                                     counted_domains.append(d)
>                                     while r < (domain_count -1):
> # Remove all other instances of an email once counted
>                                                     domain_list.remove(d)
>                                                     r = r + 1
>                         line_count = line_count + 1
>             return counted_domains
>
>
> domains = get_domains(mail_list)
> counted = count_domains(domains)
> print counted
>
> ********************************************
>
>






More information about the Python-list mailing list