[Numpy-discussion] creating zonal statistics from two arrays

Gregory, Matthew matt.gregory at oregonstate.edu
Wed Dec 8 12:12:44 EST 2010

Hi all,

Likely a very newbie type of question.  I'm using numpy with GDAL to calculate zonal statistics on images.  The basic approach is that I have a zone raster and a value raster which are aligned spatially and I am storing each zone's corresponding values in a dictionary, then calculating the statistics on that population.  (I'm well aware that this approach may have memory issues with large rasters ...)

GDAL ReadAsArray gives you a chunk of raster data as a numpy array.  Currently I'm iterating over rows and columns of that chunk, but I'm guessing there's a better (and more numpy-like) way.

zone_stats = {}
zone_block = zone_band.ReadAsArray(x_off, y_off, x_size, y_size)
value_block = value_band.ReadAsArray(x_off, y_off, x_size, y_size)
for row in xrange(y_size):
    for col in xrange(x_size):
        zone = zone_block[row][col]
        value = value_block[row][col]
        except KeyError:
            zone_stats[zone] = [value]

# Then calculate stats per zone

Thanks for all suggestions on how to make this better, especially if the initial approach I'm taking is flawed.


More information about the NumPy-Discussion mailing list