creating zonal statistics from two arrays

Hi all,
Likely a very newbie type of question. I'm using numpy with GDAL to calculate zonal statistics on images. The basic approach is that I have a zone raster and a value raster which are aligned spatially and I am storing each zone's corresponding values in a dictionary, then calculating the statistics on that population. (I'm well aware that this approach may have memory issues with large rasters ...)
GDAL ReadAsArray gives you a chunk of raster data as a numpy array. Currently I'm iterating over rows and columns of that chunk, but I'm guessing there's a better (and more numpy-like) way.
zone_stats = {} zone_block = zone_band.ReadAsArray(x_off, y_off, x_size, y_size) value_block = value_band.ReadAsArray(x_off, y_off, x_size, y_size) for row in xrange(y_size): for col in xrange(x_size): zone = zone_block[row][col] value = value_block[row][col] try: zone_stats[zone].append(value) except KeyError: zone_stats[zone] = [value]
# Then calculate stats per zone ...
Thanks for all suggestions on how to make this better, especially if the initial approach I'm taking is flawed.
matt

On Wed, Dec 8, 2010 at 12:12 PM, Gregory, Matthew matt.gregory@oregonstate.edu wrote:
Hi all,
Likely a very newbie type of question. I'm using numpy with GDAL to calculate zonal statistics on images. The basic approach is that I have a zone raster and a value raster which are aligned spatially and I am storing each zone's corresponding values in a dictionary, then calculating the statistics on that population. (I'm well aware that this approach may have memory issues with large rasters ...)
GDAL ReadAsArray gives you a chunk of raster data as a numpy array. Currently I'm iterating over rows and columns of that chunk, but I'm guessing there's a better (and more numpy-like) way.
zone_stats = {} zone_block = zone_band.ReadAsArray(x_off, y_off, x_size, y_size) value_block = value_band.ReadAsArray(x_off, y_off, x_size, y_size) for row in xrange(y_size): for col in xrange(x_size): zone = zone_block[row][col] value = value_block[row][col] try: zone_stats[zone].append(value) except KeyError: zone_stats[zone] = [value]
# Then calculate stats per zone ...
Just a thought since I'm not doing spatial statistics.
If you can create (integer) labels that assigns each point to a zone, then you can treat it essentially as a 1d grouped data, and you could use np.bincount to calculate some statistics, or alternatively scipy.ndimage.measurements for some additional statistics.
This would avoid any python loop, but require a full label array.
Josef
Thanks for all suggestions on how to make this better, especially if the initial approach I'm taking is flawed.
matt
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On 12/8/2010 9:48 AM, josef.pktd@gmail.com wrote:
Just a thought since I'm not doing spatial statistics.
If you can create (integer) labels that assigns each point to a zone, then you can treat it essentially as a 1d grouped data, and you could use np.bincount to calculate some statistics, or alternatively scipy.ndimage.measurements for some additional statistics.
This would avoid any python loop, but require a full label array.
Josef,
The measurements module did the trick; thanks for the pointer. I just stumbled across a very similar thread on the scipy listserv that you answered basically the same question with some nice code (sorry for the redundancy):
http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html
BTW, the OP on that thread (Jose Gomez-Dans) has a script out there for doing just this type of operation that I was after:
http://sites.google.com/site/spatialpython/zonal-statistics
thanks, matt
participants (3)
-
Gregory, Matthew
-
josef.pktd@gmail.com
-
Matt Gregory