[Pythonmac-SIG] Fast way to get a list of unique values?

landauer@got.net landauer@got.net
Mon, 9 Sep 2002 17:30:57 -0700


Quoting Robb Brown <brownr@ucalgary.ca>:
>
> I have a large array (Numeric) of integers.  I'd like to get a list 
> of all the unique values in that array.  Naturally I'd like to avoid 
> a for loop.

Here's one way.  It does avoid using a for loop.
I'm guessing that it isn't quite exactly what you wanted.
Fun exercise, though.   See below for a more serious suggestion.
(This is *Python*, after all...)

#!/usr/bin/env python
#
#  Robb has a "large array (Numeric) of integers.  I'd like to get
#   a list of all the unique values in that array.  Naturally I'd like
#   to avoid a for loop.  Is there an easy way to do this in Python?
#
#  No mention of speed ...
#

from operator import __mul__

#  --- from the cookbook, w/ slight mods --
# Sieve of Eratosthenes
# David Eppstein, UC Irvine, 28 Feb 2002
# From the ASPN Python cookbook ...
#  http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/117119
#
from __future__ import generators

def eratosthenes ():
    '''Yields the sequence of prime numbers via the Sieve of Eratosthenes.'''
    D = {}  # map composite integers to primes witnessing their compositeness
    q = 2   # first integer to test for primality
    while 1:
	if q not in D:
	    yield q        # not marked composite, must be prime
            D[q*q] = [q]   # first multiple of q not already marked
        else:
            map( lambda p: D.setdefault(p+q,[]).append(p), D[q] )
            del D[q]       # no longer need D[q], free memory
        q += 1

primes_iter = eratosthenes()
primes_list = [ ]
prim_d = { }

def primes ( n ):
    while  len(primes_list) <= n:
        nx = primes_iter.next()
        primes_list.append( nx )
        prim_d[ nx ] = len(primes_list)-1
    return primes_list[n]
    
def uniq ( arr ):
    # map the array into a goedel number of sorts
    gm = min( arr )
    gx = max( arr )

    # offset the list to n in [ 0 .. max-min+1 ]
    # and then find the nth prime
    bias = map( lambda x: primes(x-gm), arr )

    # blast all those primes together
    goedel = reduce( __mul__, bias )

    # find the ones that factor into goedel
    fax = filter( lambda n: goedel%n == 0, primes_list )

    # print len(arr), arr
    # print bias
    # print goedel
    # print len(fax), fax
    return map( lambda z: prim_d[z] + gm, fax )

if __name__ == "__main__":
    def test_e ():
        iter = eratosthenes()
        lst = []
        while 1:
            ff = iter.next()
            lst.append( ff )
            print ff
            if ff > 1000:
                break
        print lst[ -12: ]
            
    def test_p ():
        print primes( 12 )

    def test ():
        arr = [ -9, -2, -4, 0, 19, 2, 42, -9, -2, -4, 0, 19, 2, 421, 32, 42 ]
        print uniq( arr )

    test_e()
    test_p()

    test()


>  Is there an easy way to do this in Python?

You do not tell us where you acquired that "large array".
My suggestion, given the lack of any other detail, would be to see if
it's possible to create that array as a dict instead (with the numbers
as keys, as Bob mentioned).   I.e., do the unique-ifying while building
the array, instead of treating it as a separate problem.




-=-=-=-

food. shelter. clothing. net.         Got.net - The Internet Connection, Inc