[Numpy-discussion] creating zonal statistics from two arrays

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Dec 8 12:48:51 EST 2010

On Wed, Dec 8, 2010 at 12:12 PM, Gregory, Matthew
<matt.gregory at oregonstate.edu> wrote:
> Hi all,
> Likely a very newbie type of question.  I'm using numpy with GDAL to calculate zonal statistics on images.  The basic approach is that I have a zone raster and a value raster which are aligned spatially and I am storing each zone's corresponding values in a dictionary, then calculating the statistics on that population.  (I'm well aware that this approach may have memory issues with large rasters ...)
> GDAL ReadAsArray gives you a chunk of raster data as a numpy array.  Currently I'm iterating over rows and columns of that chunk, but I'm guessing there's a better (and more numpy-like) way.
> zone_stats = {}
> zone_block = zone_band.ReadAsArray(x_off, y_off, x_size, y_size)
> value_block = value_band.ReadAsArray(x_off, y_off, x_size, y_size)
> for row in xrange(y_size):
>    for col in xrange(x_size):
>        zone = zone_block[row][col]
>        value = value_block[row][col]
>        try:
>            zone_stats[zone].append(value)
>        except KeyError:
>            zone_stats[zone] = [value]
> # Then calculate stats per zone
> ...

Just a thought since I'm not doing spatial statistics.

If you can create (integer) labels that assigns each point to a zone,
then you can treat it essentially as a 1d grouped data, and you could
use np.bincount to calculate some statistics, or alternatively
scipy.ndimage.measurements for some additional statistics.

This would avoid any python loop, but require a full label array.


> Thanks for all suggestions on how to make this better, especially if the initial approach I'm taking is flawed.
> matt
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

More information about the NumPy-Discussion mailing list