frequency of values in a field

Ethan Furman ethan at
Wed Feb 9 15:28:48 EST 2011

noydb wrote:
 > Paul Rubin wrote:
>> The Decimal module is pretty slow but is conceptually probably the right
>> way to do this.  With just 50k records it shouldn't be too bad.  With
>> more records you might look for a faster way.
>>     from decimal import Decimal as D
>>     from collections import defaultdict
>>     records = ['3.14159','2.71828','3.142857']
>>     td = defaultdict(int)
>>     for x in records:
>>         td[D(x).quantize(D('0.01'))] += 1
>>     print td
> I played with this - it worked.  Using Python 2.6 so counter no good.
> I require an output text file of sorted "key value" so I added
> (further code to write out to an actual textfile, not important here)
>>> for z in sorted(set(td)):
>>>     print z, td[z]
> So it seems the idea is to add all the records in the particular field
> of interest into a list (record).  How does one do this in pure
> Python?
> Normally in my work with gis/arcgis sw, I would do a search cursor on
> the DBF file and add each value in the particular field into a list
> (to populate records above).  Something like:
> --> import arcgisscripting
> --> # Create the geoprocessor object
> --> gp = arcgisscripting.create()
> --> records_list = []
> --> cur = gp.SearchCursor(dbfTable)
> --> row = cur.Next()
> --> while row:
> -->    value = row.particular_field
> -->    records_list.append(value)

Are you trying to get away from arcgisscripting?  There is a pure python 
dbf package on PyPI (I know, I put it there ;) that you can use to 
access the .dbf file in question (assuming it's a dBase III, IV, or 
FoxPro format). if you're interested.

Using it, the code above could be:

import dbf
from collections import defaultdict
from decimal import Decimal

table = dbf.Table('path/to/table/table_name')

freq = defaultdict(int)
for record in table:
     value = Decimal(record['field_of_interest'])
     key = value.quantize(Decimal('0.01'))
     freq[key] += 1

for z in sorted(freq):
     print z, freq[z]


Numeric/Float field types are returned as python floats*, so there may 
be slight discrepancies between the stored value and the returned value.

Hope this helps.


*Unless created with zero decimal places, in which case they are 
returned as python integers.

More information about the Python-list mailing list