# [Tutor] Lists of duplicates

Steven D'Aprano steve at pearwood.info
Thu Mar 9 08:54:22 EST 2017

```On Thu, Mar 09, 2017 at 01:26:26AM +0530, Sri Kavi wrote:

> I wrote and submitted the following function:
>
> def sublist_duplicates(lst):
>     sublists = []
>     for item in set(lst):
>         sublists.append([item] * lst.count(item))
>     return sublists
>
> lst_of_duplicates = [1, 2, 10, 3, 4, 1, 's', 2, 3, 1, 4, 's']
>
> print sublist_duplicates(lst_of_duplicates)
>
> A participant told me that that'll grind into a near-halt if there are many
> different elements, for example range(1_000_000) (1 million unique values,
> so it checks the whole list (length 1 million) for the count, a million
> times, a total of 1000 billion operations…

Correct. lst.count(item) has to walk the entire list, counting elements.
It does that in C code, so it is fast, but still, if there are a lot of
elements to walk, it takes some time. Then you do it again, and again,
and again...

For a list with N items, your sublist_duplicates function walks the
entire list N*N times. When N is large, N**2 is even larger.

> I thought he was right. I wrote another version of the function, this time
> using collections.Counter.
>
> from collections import Counter
>
> def sublist_duplicates(lst):
>     counts = Counter(lst)
>     sublists = [[k] * counts[k] for k in counts]
>     return sublists
>
> _lst = list(range(1000000))
[...]

> I found that the first version works better with a small list,

Yes, but for a small list the total time is likely to be so small that
it doesn't really matter.

> but the
> second version outperforms the first one when it’s given a considerably big
> list of data.
>
>