Selecting unique values

Sells, Fred fred.sells at adventistcare.org
Tue Jul 26 19:39:25 CEST 2011


The set module or function (depends on which python version) will do
this if you make each record a tuple.

-----Original Message-----
From: python-list-bounces+frsells=adventistcare.org at python.org
[mailto:python-list-bounces+frsells=adventistcare.org at python.org] On
Behalf Of Peter Otten
Sent: Tuesday, July 26, 2011 5:04 AM
To: python-list at python.org
Subject: Re: Selecting unique values

Kumar Mainali wrote:

> I have a dataset with occurrence records of multiple species. I need
to
> get rid of multiple listings of the same occurrence point for a
species
> (as you see below in red and blue typeface). How do I create a dataset
> only with unique set of longitude and latitude for each species?
Thanks in
> advance.
> 
> Species_name Longitude Latitude
> Abies concolor -106.601 35.868
> Abies concolor -106.493 35.9682
> Abies concolor -106.489 35.892
> Abies concolor -106.496 35.8542
> Accipiter cooperi -119.688 34.4339
> Accipiter cooperi -119.792 34.5069
> Accipiter cooperi -118.797 34.2581
> Accipiter cooperi -77.38333 39.68333
> Accipiter cooperi -77.38333 39.68333
> Accipiter cooperi -75.99153 40.633335
> Accipiter cooperi -75.99153 40.633335

>>> def uniquify(items):
...     seen = set()
...     for item in items:
...             if item not in seen:
...                     seen.add(item)
...                     yield item
...
>>> import sys
>>> sys.stdout.writelines(uniquify(open("species.txt")))
Species_name Longitude Latitude
Abies concolor -106.601 35.868
Abies concolor -106.493 35.9682
Abies concolor -106.489 35.892
Abies concolor -106.496 35.8542
Accipiter cooperi -119.688 34.4339
Accipiter cooperi -119.792 34.5069
Accipiter cooperi -118.797 34.2581
Accipiter cooperi -77.38333 39.68333
Accipiter cooperi -75.99153 40.633335

If you need to massage the lines a bit:

>>> def uniquify(items, key=None):
...     seen = set()
...     for item in items:
...             if key is None:
...                     keyval = item
...             else:
...                     keyval = key(item)
...             if keyval not in seen:
...                     seen.add(keyval)
...                     yield item
...

Unique latitudes:

>>> sys.stdout.writelines(uniquify(open("species.txt"), key=lambda s: 
s.rsplit(None, 1)[-1]))
Species_name Longitude Latitude
Abies concolor -106.601 35.868
Abies concolor -106.493 35.9682
Abies concolor -106.489 35.892
Abies concolor -106.496 35.8542
Accipiter cooperi -119.688 34.4339
Accipiter cooperi -119.792 34.5069
Accipiter cooperi -118.797 34.2581
Accipiter cooperi -77.38333 39.68333
Accipiter cooperi -75.99153 40.633335

Unique species names:

>>> sys.stdout.writelines(uniquify(open("species.txt"), key=lambda s: 
s.rsplit(None, 2)[0]))
Species_name Longitude Latitude
Abies concolor -106.601 35.868
Accipiter cooperi -119.688 34.4339

Bonus: open() is not the built-in here:

>>> from StringIO import StringIO
>>> def open(filename):          
...     return StringIO("""Species_name Longitude Latitude
... Abies concolor -106.601 35.868                        
... Abies concolor -106.493 35.9682                       
... Abies concolor -106.489 35.892                        
... Abies concolor -106.496 35.8542                       
... Accipiter cooperi -119.688 34.4339                    
... Accipiter cooperi -119.792 34.5069                    
... Accipiter cooperi -118.797 34.2581                    
... Accipiter cooperi -77.38333 39.68333                  
... Accipiter cooperi -77.38333 39.68333                  
... Accipiter cooperi -75.99153 40.633335                 
... Accipiter cooperi -75.99153 40.633335                 
... """)                                                  
...                                                       


-- 
http://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list