[Tutor] Looking for duplicates within a list

Fri Jun 11 18:18:52 CEST 2010

On 11 jun 2010, at 17:49, Steven D'Aprano <steve at pearwood.info> wrote:

> On Sat, 12 Jun 2010 12:58:19 am Alan Gauld wrote:
>
>> Have you looked at the count method of lists?
>>
>> Something like:
>>
>> counts = set(( item, mylist.count(item)) for item in mylist if
>> mylist.count(item) > 1)
>
> That's a Shlemiel the Painter algorithm.
>
> http://www.joelonsoftware.com/articles/fog0000000319.html
>

It is, but it's also very elegant and simple to understand. And it
works fine on small inputs. Everything is fast for small n.

>
>> Seems to work...
>
> You say that now, but one day you will use it on a list of 100,000
> items, and you'll wonder why it takes 45 minutes to finish, and curse
> Python for being slow.
>

Actually, now that you know it is a shlemiel the painter's algorithm,
you won't have to wonder anymore. And you'll just say: "well, this
piece of code might need to handle huge lists someday, I'll use a
dictionary."

I guess what I'm trying to say is: using code that performs bad in
situations that won't be encountered anyway is not inherently bad.
Otherwise, we'd all still be writing everything in C.

The corrolary, of course, is that you should always know what the
performance characteristics of your code are, and what kind of data it
will handle.

Hugo