frequency of values in a field

noydb jenn.duerr at gmail.com
Wed Feb 9 15:44:45 EST 2011


On Feb 9, 3:28 pm, Ethan Furman <et... at stoneleaf.us> wrote:
> noydb wrote:
>
>  > Paul Rubin wrote:
>
>
>
>
>
> >> The Decimal module is pretty slow but is conceptually probably the right
> >> way to do this.  With just 50k records it shouldn't be too bad.  With
> >> more records you might look for a faster way.
>
> >>     from decimal import Decimal as D
> >>     from collections import defaultdict
>
> >>     records = ['3.14159','2.71828','3.142857']
>
> >>     td = defaultdict(int)
> >>     for x in records:
> >>         td[D(x).quantize(D('0.01'))] += 1
>
> >>     print td
>
> > I played with this - it worked.  Using Python 2.6 so counter no good.
>
> > I require an output text file of sorted "key value" so I added
> > (further code to write out to an actual textfile, not important here)
> >>> for z in sorted(set(td)):
> >>>     print z, td[z]
>
> > So it seems the idea is to add all the records in the particular field
> > of interest into a list (record).  How does one do this in pure
> > Python?
> > Normally in my work with gis/arcgis sw, I would do a search cursor on
> > the DBF file and add each value in the particular field into a list
> > (to populate records above).  Something like:
>
> > --> import arcgisscripting
> > --> # Create the geoprocessor object
> > --> gp = arcgisscripting.create()
> > --> records_list = []
> > --> cur = gp.SearchCursor(dbfTable)
> > --> row = cur.Next()
> > --> while row:
> > -->    value = row.particular_field
> > -->    records_list.append(value)
>
> Are you trying to get away from arcgisscripting?  There is a pure python
> dbf package on PyPI (I know, I put it there ;) that you can use to
> access the .dbf file in question (assuming it's a dBase III, IV, or
> FoxPro format).
>
> http://pypi.python.org/pypi/dbf/0.88.16if you're interested.
>
> Using it, the code above could be:
>
> -----------------------------------------------------
> import dbf
> from collections import defaultdict
> from decimal import Decimal
>
> table = dbf.Table('path/to/table/table_name')
>
> freq = defaultdict(int)
> for record in table:
>      value = Decimal(record['field_of_interest'])
>      key = value.quantize(Decimal('0.01'))
>      freq[key] += 1
>
> for z in sorted(freq):
>      print z, freq[z]
>
> -----------------------------------------------------
>
> Numeric/Float field types are returned as python floats*, so there may
> be slight discrepancies between the stored value and the returned value.
>
> Hope this helps.
>
> ~Ethan~
>
> *Unless created with zero decimal places, in which case they are
> returned as python integers.- Hide quoted text -
>
> - Show quoted text -



Oops, didn't see htis before I posted last.

Thanks!  I'll try this, looks good, makes sense.



More information about the Python-list mailing list