[Tutor] Lists of duplicates
Sri Kavi
gvmcmt at gmail.com
Wed Mar 8 14:56:26 EST 2017
As part of my learning, I was participating in a discussion at:
https://discuss.codecademy.com/t/python-lists-of-duplicated-elements/78151
It’s about making a function that returns a list of lists, with each list
being all of the elements that are the same as another element in the
original list.
function([1, 2, 3, 4, 1, 1, 2, 3]) should return [[1, 1, 1], [2, 2], [3,
3], [4]]
I wrote and submitted the following function:
def sublist_duplicates(lst):
sublists = []
for item in set(lst):
sublists.append([item] * lst.count(item))
return sublists
lst_of_duplicates = [1, 2, 10, 3, 4, 1, 's', 2, 3, 1, 4, 's']
print sublist_duplicates(lst_of_duplicates)
A participant told me that that'll grind into a near-halt if there are many
different elements, for example range(1_000_000) (1 million unique values,
so it checks the whole list (length 1 million) for the count, a million
times, a total of 1000 billion operations…
I thought he was right. I wrote another version of the function, this time
using collections.Counter.
from collections import Counter
def sublist_duplicates(lst):
counts = Counter(lst)
sublists = [[k] * counts[k] for k in counts]
return sublists
_lst = list(range(1000000))
import timeit
time = timeit.timeit("sublist_duplicates(_lst)", setup = "from __main__
import sublist_duplicates, _lst", number=1)
print("Number of elements in the list: {}".format(len(_lst)))
print("Execution time of the function: {} seconds".format(round(time, 2)))
I found that the first version works better with a small list, but the
second version outperforms the first one when it’s given a considerably big
list of data.
I’m asking for your comments on these two functions. I’m not even sure if
this is how it should be coded.
Sri
More information about the Tutor
mailing list