Selecting unique values

Peter Otten __peter__ at web.de
Tue Jul 26 11:03:43 CEST 2011


Kumar Mainali wrote:

> I have a dataset with occurrence records of multiple species. I need to
> get rid of multiple listings of the same occurrence point for a species
> (as you see below in red and blue typeface). How do I create a dataset
> only with unique set of longitude and latitude for each species? Thanks in
> advance.
> 
> Species_name Longitude Latitude
> Abies concolor -106.601 35.868
> Abies concolor -106.493 35.9682
> Abies concolor -106.489 35.892
> Abies concolor -106.496 35.8542
> Accipiter cooperi -119.688 34.4339
> Accipiter cooperi -119.792 34.5069
> Accipiter cooperi -118.797 34.2581
> Accipiter cooperi -77.38333 39.68333
> Accipiter cooperi -77.38333 39.68333
> Accipiter cooperi -75.99153 40.633335
> Accipiter cooperi -75.99153 40.633335

>>> def uniquify(items):
...     seen = set()
...     for item in items:
...             if item not in seen:
...                     seen.add(item)
...                     yield item
...
>>> import sys
>>> sys.stdout.writelines(uniquify(open("species.txt")))
Species_name Longitude Latitude
Abies concolor -106.601 35.868
Abies concolor -106.493 35.9682
Abies concolor -106.489 35.892
Abies concolor -106.496 35.8542
Accipiter cooperi -119.688 34.4339
Accipiter cooperi -119.792 34.5069
Accipiter cooperi -118.797 34.2581
Accipiter cooperi -77.38333 39.68333
Accipiter cooperi -75.99153 40.633335

If you need to massage the lines a bit:

>>> def uniquify(items, key=None):
...     seen = set()
...     for item in items:
...             if key is None:
...                     keyval = item
...             else:
...                     keyval = key(item)
...             if keyval not in seen:
...                     seen.add(keyval)
...                     yield item
...

Unique latitudes:

>>> sys.stdout.writelines(uniquify(open("species.txt"), key=lambda s: 
s.rsplit(None, 1)[-1]))
Species_name Longitude Latitude
Abies concolor -106.601 35.868
Abies concolor -106.493 35.9682
Abies concolor -106.489 35.892
Abies concolor -106.496 35.8542
Accipiter cooperi -119.688 34.4339
Accipiter cooperi -119.792 34.5069
Accipiter cooperi -118.797 34.2581
Accipiter cooperi -77.38333 39.68333
Accipiter cooperi -75.99153 40.633335

Unique species names:

>>> sys.stdout.writelines(uniquify(open("species.txt"), key=lambda s: 
s.rsplit(None, 2)[0]))
Species_name Longitude Latitude
Abies concolor -106.601 35.868
Accipiter cooperi -119.688 34.4339

Bonus: open() is not the built-in here:

>>> from StringIO import StringIO
>>> def open(filename):          
...     return StringIO("""Species_name Longitude Latitude
... Abies concolor -106.601 35.868                        
... Abies concolor -106.493 35.9682                       
... Abies concolor -106.489 35.892                        
... Abies concolor -106.496 35.8542                       
... Accipiter cooperi -119.688 34.4339                    
... Accipiter cooperi -119.792 34.5069                    
... Accipiter cooperi -118.797 34.2581                    
... Accipiter cooperi -77.38333 39.68333                  
... Accipiter cooperi -77.38333 39.68333                  
... Accipiter cooperi -75.99153 40.633335                 
... Accipiter cooperi -75.99153 40.633335                 
... """)                                                  
...                                                       





More information about the Python-list mailing list