how to remove multiple occurrences of a string within a list?

Wed Apr 4 22:43:28 EDT 2007

Amit Khemka <khemkaamit at gmail.com> wrote:

> On 3 Apr 2007 11:20:33 -0700, bahoo <b83503104 at yahoo.com> wrote:
> > Hi,
> >
> > I have a list like ['0024', 'haha', '0024']
> > and as output I want ['haha']
> >
> > If I
> > myList.remove('0024')
> >
> > then only the first instance of '0024' is removed.
> 
> To remove all items with multiple occurances in myList:
> 
> list(set(myList) - set([x for x in myList if myList.count(x)>1]))

This approach is unfortunately quite inefficient, because each call to
myList.count is O(N), and there are O(N) such calls, so the whole thing
is O(N squared).  [also, the inner square brackets are not needed, and
cause more memory to be consumed than omitting them would].

A "we-don't-need-no-stinkin'-one-liners" more relaxed approach:

import collections
d = collections.defaultdict(int)
for x in myList: d[x] += 1
list(x for x in myList if d[x]==1)

yields O(N) performance (give that dict-indexing is about O(1)...).

Collapsing this kind of approach back into a "one-liner" while keeping
the O(N) performance is not easy -- whether this is a fortunate or
unfortunate ocurrence is of course debatable.  If we had a "turn
sequence into bag" function somewhere (and it might be worth having it
for other reasons):

def bagit(seq):
  import collections
  d = collections.defaultdict(int)
  for x in seq: d[x] += 1
  return d

then one-linerness might be achieved in the "gimme nonduplicated items"
task via a dirty, rotten trick...:

list(x for bag in [bagit(myList)] for x in myList if bag[x] == 1)

...which I would of course never mention in polite company...

Alex