Selecting unique values
Peter Otten
__peter__ at web.de
Tue Jul 26 05:03:43 EDT 2011
Kumar Mainali wrote:
> I have a dataset with occurrence records of multiple species. I need to
> get rid of multiple listings of the same occurrence point for a species
> (as you see below in red and blue typeface). How do I create a dataset
> only with unique set of longitude and latitude for each species? Thanks in
> advance.
>
> Species_name Longitude Latitude
> Abies concolor -106.601 35.868
> Abies concolor -106.493 35.9682
> Abies concolor -106.489 35.892
> Abies concolor -106.496 35.8542
> Accipiter cooperi -119.688 34.4339
> Accipiter cooperi -119.792 34.5069
> Accipiter cooperi -118.797 34.2581
> Accipiter cooperi -77.38333 39.68333
> Accipiter cooperi -77.38333 39.68333
> Accipiter cooperi -75.99153 40.633335
> Accipiter cooperi -75.99153 40.633335
>>> def uniquify(items):
... seen = set()
... for item in items:
... if item not in seen:
... seen.add(item)
... yield item
...
>>> import sys
>>> sys.stdout.writelines(uniquify(open("species.txt")))
Species_name Longitude Latitude
Abies concolor -106.601 35.868
Abies concolor -106.493 35.9682
Abies concolor -106.489 35.892
Abies concolor -106.496 35.8542
Accipiter cooperi -119.688 34.4339
Accipiter cooperi -119.792 34.5069
Accipiter cooperi -118.797 34.2581
Accipiter cooperi -77.38333 39.68333
Accipiter cooperi -75.99153 40.633335
If you need to massage the lines a bit:
>>> def uniquify(items, key=None):
... seen = set()
... for item in items:
... if key is None:
... keyval = item
... else:
... keyval = key(item)
... if keyval not in seen:
... seen.add(keyval)
... yield item
...
Unique latitudes:
>>> sys.stdout.writelines(uniquify(open("species.txt"), key=lambda s:
s.rsplit(None, 1)[-1]))
Species_name Longitude Latitude
Abies concolor -106.601 35.868
Abies concolor -106.493 35.9682
Abies concolor -106.489 35.892
Abies concolor -106.496 35.8542
Accipiter cooperi -119.688 34.4339
Accipiter cooperi -119.792 34.5069
Accipiter cooperi -118.797 34.2581
Accipiter cooperi -77.38333 39.68333
Accipiter cooperi -75.99153 40.633335
Unique species names:
>>> sys.stdout.writelines(uniquify(open("species.txt"), key=lambda s:
s.rsplit(None, 2)[0]))
Species_name Longitude Latitude
Abies concolor -106.601 35.868
Accipiter cooperi -119.688 34.4339
Bonus: open() is not the built-in here:
>>> from StringIO import StringIO
>>> def open(filename):
... return StringIO("""Species_name Longitude Latitude
... Abies concolor -106.601 35.868
... Abies concolor -106.493 35.9682
... Abies concolor -106.489 35.892
... Abies concolor -106.496 35.8542
... Accipiter cooperi -119.688 34.4339
... Accipiter cooperi -119.792 34.5069
... Accipiter cooperi -118.797 34.2581
... Accipiter cooperi -77.38333 39.68333
... Accipiter cooperi -77.38333 39.68333
... Accipiter cooperi -75.99153 40.633335
... Accipiter cooperi -75.99153 40.633335
... """)
...
More information about the Python-list
mailing list