[Tutor] Lists of duplicates

Sri Kavi gvmcmt at gmail.com
Wed Mar 8 14:56:26 EST 2017


As part of my learning, I was participating in a discussion at:



https://discuss.codecademy.com/t/python-lists-of-duplicated-elements/78151



It’s about making a function that returns a list of lists, with each list
being all of the elements that are the same as another element in the
original list.



function([1, 2, 3, 4, 1, 1, 2, 3]) should return [[1, 1, 1], [2, 2], [3,
3], [4]]


I wrote and submitted the following function:

def sublist_duplicates(lst):
    sublists = []

    for item in set(lst):
        sublists.append([item] * lst.count(item))

    return sublists

lst_of_duplicates = [1, 2, 10, 3, 4, 1, 's', 2, 3, 1, 4, 's']

print sublist_duplicates(lst_of_duplicates)

A participant told me that that'll grind into a near-halt if there are many
different elements, for example range(1_000_000) (1 million unique values,
so it checks the whole list (length 1 million) for the count, a million
times, a total of 1000 billion operations…


I thought he was right. I wrote another version of the function, this time
using collections.Counter.

from collections import Counter

def sublist_duplicates(lst):
    counts = Counter(lst)
    sublists = [[k] * counts[k] for k in counts]
    return sublists

_lst = list(range(1000000))

import timeit

time = timeit.timeit("sublist_duplicates(_lst)", setup = "from __main__
import sublist_duplicates, _lst", number=1)

print("Number of elements in the list: {}".format(len(_lst)))
print("Execution time of the function: {} seconds".format(round(time, 2)))

I found that the first version works better with a small list, but the
second version outperforms the first one when it’s given a considerably big
list of data.


I’m asking for your comments on these two functions. I’m not even sure if
this is how it should be coded.

Sri


More information about the Tutor mailing list