# [Pythonmac-SIG] Fast way to get a list of unique values?

Larry Meyn lmeyn@mail.arc.nasa.gov
Mon, 9 Sep 2002 14:14:01 -0700

```--Apple-Mail-2--388249761
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
format=flowed

Here's a function from the Python Cookbook:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560

def unique(s):
"""Return a list of the elements in s, but without duplicates.

For example, unique([1,2,3,1,2,3]) is some permutation of [1,2,3],
unique("abcabc") some permutation of ["a", "b", "c"], and
unique(([1, 2], [2, 3], [1, 2])) some permutation of
[[2, 3], [1, 2]].

For best speed, all sequence elements should be hashable.  Then
unique() will usually work in linear time.

If not possible, the sequence elements should enjoy a total
ordering, and if list(s).sort() doesn't raise TypeError it's
assumed that they do enjoy a total ordering.  Then unique() will
usually work in O(N*log2(N)) time.

If that's not possible either, the sequence elements must support
equality-testing.  Then unique() will usually work in quadratic
time.
"""

n = len(s)
if n == 0:
return []

# Try using a dict first, as that's the fastest and will usually
# work.  If it doesn't work, it will usually fail quickly, so it
# usually doesn't cost much to *try* it.  It requires that all the
# sequence elements be hashable, and support equality comparison.
u = {}
try:
for x in s:
u[x] = 1
except TypeError:
del u  # move on to the next method
else:
return u.keys()

# We can't hash all the elements.  Second fastest is to sort,
# which brings the equal elements together; then duplicates are
# easy to weed out in a single pass.
# NOTE:  Python's list.sort() was designed to be efficient in the
# presence of many duplicate elements.  This isn't true of all
# sort functions in all languages or libraries, so this approach
# is more effective in Python than it may be elsewhere.
try:
t = list(s)
t.sort()
except TypeError:
del t  # move on to the next method
else:
assert n > 0
last = t[0]
lasti = i = 1
while i < n:
if t[i] != last:
t[lasti] = last = t[i]
lasti += 1
i += 1
return t[:lasti]

# Brute force is all that's left.
u = []
for x in s:
if x not in u:
u.append(x)
return u

On Monday, September 9, 2002, at 01:48  PM, Robb Brown wrote:

>
> I have a large array (Numeric) of integers.  I'd like to get a list of
> all the unique values in that array.  Naturally I'd like to avoid a
> for loop.  Is there an easy way to do this in Python?
>
> Thanks,
>
> Robb
> --
> ______________________________
> Robb Brown
> Seaman Family MR Research Centre
>
> _______________________________________________
> Pythonmac-SIG maillist  -  Pythonmac-SIG@python.org
> http://mail.python.org/mailman/listinfo/pythonmac-sig
>
>

Larry Meyn
Aerospace Operations Modeling Office

M/S 210-10                      Phone:  (650) 604-5038
NASA Ames Research Center       Fax:    (650) 604-0222
Moffett Field, CA 94035-1000    E-mail: lmeyn@mail.arc.nasa.gov
E-Fax:  (425) 944-5526 sent via e-mail

--Apple-Mail-2--388249761
Content-Transfer-Encoding: 7bit
Content-Type: text/enriched;
charset=US-ASCII

Here's a function from the Python Cookbook:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560

def unique(s):

"""Return a list of the elements in s, but without duplicates.

For example, unique([1,2,3,1,2,3]) is some permutation of [1,2,3],

unique("abcabc") some permutation of ["a", "b", "c"], and

unique(([1, 2], [2, 3], [1, 2])) some permutation of

[[2, 3], [1, 2]].

For best speed, all sequence elements should be hashable.  Then

unique() will usually work in linear time.

If not possible, the sequence elements should enjoy a total

ordering, and if list(s).sort() doesn't raise TypeError it's

assumed that they do enjoy a total ordering.  Then unique() will

usually work in O(N*log2(N)) time.

If that's not possible either, the sequence elements must support

equality-testing.  Then unique() will usually work in quadratic

time.

"""

n = len(s)

if n == 0:

return []

# Try using a dict first, as that's the fastest and will usually

# work.  If it doesn't work, it will usually fail quickly, so it

# usually doesn't cost much to *try* it.  It requires that all the

# sequence elements be hashable, and support equality comparison.

u = {}

try:

for x in s:

u[x] = 1

except TypeError:

del u  # move on to the next method

else:

return u.keys()

# We can't hash all the elements.  Second fastest is to sort,

# which brings the equal elements together; then duplicates are

# easy to weed out in a single pass.

# NOTE:  Python's list.sort() was designed to be efficient in the

# presence of many duplicate elements.  This isn't true of all

# sort functions in all languages or libraries, so this approach

# is more effective in Python than it may be elsewhere.

try:

t = list(s)

t.sort()

except TypeError:

del t  # move on to the next method

else:

assert n > 0

last = t[0]

lasti = i = 1

while i << n:

if t[i] != last:

t[lasti] = last = t[i]

lasti += 1

i += 1

return t[:lasti]

# Brute force is all that's left.

u = []

for x in s:

if x not in u:

u.append(x)

return u

On Monday, September 9, 2002, at 01:48  PM, Robb Brown wrote:

<excerpt>

I have a large array (Numeric) of integers.  I'd like to get a list of
all the unique values in that array.  Naturally I'd like to avoid a
for loop.  Is there an easy way to do this in Python?

Thanks,

Robb

--

______________________________

Robb Brown

Seaman Family MR Research Centre

_______________________________________________

Pythonmac-SIG maillist  -  Pythonmac-SIG@python.org

http://mail.python.org/mailman/listinfo/pythonmac-sig

</excerpt>

<fontfamily><param>Courier</param>Larry Meyn

Aerospace Operations Modeling Office

M/S 210-10                      Phone:  (650) 604-5038

NASA Ames Research Center       Fax:    (650) 604-0222

Moffett Field, CA 94035-1000    E-mail: lmeyn@mail.arc.nasa.gov

E-Fax:  (425) 944-5526 sent via e-mail

</fontfamily>
--Apple-Mail-2--388249761--

```