[Tutor] Best method for filtering lists in lists...

Wed Dec 20 02:59:16 CET 2006

On 17/12/06, Alan Gauld <alan.gauld at btinternet.com> wrote:
> As for Luke's point about the speed of conversion, I'm not
> sure how that would stack up. I have a gut feel that it might
> be faster than trying to compare the lists element by
> element, that sounds like an algebraic expansion with list
> size to me, a hashed set sounds like it should be faster.
> But as with all such things, if speed matters: measure
> and profile it.

Here's my attempt at a "smarter" way of removing duplicates:
(note that this code assumes 'None' does not occur in your lists)

def nodupes(lst):
    trie = {}
    for l in lst:
        addToTrie(l, trie)

    return unTrie([], trie)

def addToTrie(lst, trie):
    curr = trie
    for i, elem in enumerate(lst):
        try:
            curr = curr[elem]
        except KeyError:
            for e in lst[i:]:
                curr[e] = {}
                curr = curr[e]
            curr[None] = None
            break

def unTrie(prefix, trie):
    lst = []
    for elem in trie:
        if elem is None:
            lst.append(prefix)
        else:
            lst.extend(unTrie(prefix + [elem], trie[elem]))

    return lst

According to timeit, this is over five times slower than the one-line solution:

def nodupes2(lst):
    return [list(y) for y in set(tuple(x) for x in lst)]

Finally, neither of these are order-preserving...

-- 
John.