[Tutor] Lists of duplicates

Wed Mar 8 14:56:26 EST 2017

As part of my learning, I was participating in a discussion at:

https://discuss.codecademy.com/t/python-lists-of-duplicated-elements/78151

It’s about making a function that returns a list of lists, with each list
being all of the elements that are the same as another element in the
original list.

function([1, 2, 3, 4, 1, 1, 2, 3]) should return [[1, 1, 1], [2, 2], [3,
3], [4]]

I wrote and submitted the following function:

def sublist_duplicates(lst):
    sublists = []

    for item in set(lst):
        sublists.append([item] * lst.count(item))

    return sublists

lst_of_duplicates = [1, 2, 10, 3, 4, 1, 's', 2, 3, 1, 4, 's']

print sublist_duplicates(lst_of_duplicates)

A participant told me that that'll grind into a near-halt if there are many
different elements, for example range(1_000_000) (1 million unique values,
so it checks the whole list (length 1 million) for the count, a million
times, a total of 1000 billion operations…

I thought he was right. I wrote another version of the function, this time
using collections.Counter.

from collections import Counter

def sublist_duplicates(lst):
    counts = Counter(lst)
    sublists = [[k] * counts[k] for k in counts]
    return sublists

_lst = list(range(1000000))

import timeit

time = timeit.timeit("sublist_duplicates(_lst)", setup = "from __main__
import sublist_duplicates, _lst", number=1)

print("Number of elements in the list: {}".format(len(_lst)))
print("Execution time of the function: {} seconds".format(round(time, 2)))

I found that the first version works better with a small list, but the
second version outperforms the first one when it’s given a considerably big
list of data.

I’m asking for your comments on these two functions. I’m not even sure if
this is how it should be coded.

Sri