frequency of values in a field
Ethan Furman
ethan at stoneleaf.us
Wed Feb 9 15:28:48 EST 2011
noydb wrote:
> Paul Rubin wrote:
>> The Decimal module is pretty slow but is conceptually probably the right
>> way to do this. With just 50k records it shouldn't be too bad. With
>> more records you might look for a faster way.
>>
>> from decimal import Decimal as D
>> from collections import defaultdict
>>
>> records = ['3.14159','2.71828','3.142857']
>>
>> td = defaultdict(int)
>> for x in records:
>> td[D(x).quantize(D('0.01'))] += 1
>>
>> print td
>>
>
> I played with this - it worked. Using Python 2.6 so counter no good.
>
> I require an output text file of sorted "key value" so I added
> (further code to write out to an actual textfile, not important here)
>>> for z in sorted(set(td)):
>>> print z, td[z]
>
> So it seems the idea is to add all the records in the particular field
> of interest into a list (record). How does one do this in pure
> Python?
> Normally in my work with gis/arcgis sw, I would do a search cursor on
> the DBF file and add each value in the particular field into a list
> (to populate records above). Something like:
>
> --> import arcgisscripting
> --> # Create the geoprocessor object
> --> gp = arcgisscripting.create()
> --> records_list = []
> --> cur = gp.SearchCursor(dbfTable)
> --> row = cur.Next()
> --> while row:
> --> value = row.particular_field
> --> records_list.append(value)
Are you trying to get away from arcgisscripting? There is a pure python
dbf package on PyPI (I know, I put it there ;) that you can use to
access the .dbf file in question (assuming it's a dBase III, IV, or
FoxPro format).
http://pypi.python.org/pypi/dbf/0.88.16 if you're interested.
Using it, the code above could be:
-----------------------------------------------------
import dbf
from collections import defaultdict
from decimal import Decimal
table = dbf.Table('path/to/table/table_name')
freq = defaultdict(int)
for record in table:
value = Decimal(record['field_of_interest'])
key = value.quantize(Decimal('0.01'))
freq[key] += 1
for z in sorted(freq):
print z, freq[z]
-----------------------------------------------------
Numeric/Float field types are returned as python floats*, so there may
be slight discrepancies between the stored value and the returned value.
Hope this helps.
~Ethan~
*Unless created with zero decimal places, in which case they are
returned as python integers.
More information about the Python-list
mailing list