[Tutor] Lists of duplicates

Sri Kavi gvmcmt at gmail.com
Wed Mar 8 14:56:26 EST 2017

As part of my learning, I was participating in a discussion at:

It’s about making a function that returns a list of lists, with each list
being all of the elements that are the same as another element in the
original list.

function([1, 2, 3, 4, 1, 1, 2, 3]) should return [[1, 1, 1], [2, 2], [3,
3], ]

I wrote and submitted the following function:

def sublist_duplicates(lst):
sublists = []

for item in set(lst):
sublists.append([item] * lst.count(item))

return sublists

lst_of_duplicates = [1, 2, 10, 3, 4, 1, 's', 2, 3, 1, 4, 's']

print sublist_duplicates(lst_of_duplicates)

A participant told me that that'll grind into a near-halt if there are many
different elements, for example range(1_000_000) (1 million unique values,
so it checks the whole list (length 1 million) for the count, a million
times, a total of 1000 billion operations…

I thought he was right. I wrote another version of the function, this time
using collections.Counter.

from collections import Counter

def sublist_duplicates(lst):
counts = Counter(lst)
sublists = [[k] * counts[k] for k in counts]
return sublists

_lst = list(range(1000000))

import timeit

time = timeit.timeit("sublist_duplicates(_lst)", setup = "from __main__
import sublist_duplicates, _lst", number=1)

print("Number of elements in the list: {}".format(len(_lst)))
print("Execution time of the function: {} seconds".format(round(time, 2)))

I found that the first version works better with a small list, but the
second version outperforms the first one when it’s given a considerably big
list of data.