Thank you Zack:
By raster data I mean classified slope gradient (derived from a dem), landuse-landcover, lithology etc. A crosstabulation analysis will give me a table with the common areas for each class from each raster and this will go into other analysis. I can do it with other softwares (like ArcGIS DEsktop etc), but I would like to have all with numpy or to build something on top of numpy
Thanks's again
Ionut
----- Original Message ----- From: "Zachary Pincus" zachary.pincus@yale.edu To: "Discussion of Numerical Python" numpy-discussion@scipy.org Sent: Wednesday, July 14, 2010 9:42:49 PM GMT +02:00 Athens, Beirut, Bucharest, Istanbul Subject: Re: [Numpy-discussion] Crosstabulation
Hi Ionut,
Check out the "tabular" package: http://parsemydata.com/tabular/index.html
It seems to be basically what you want... it does "pivot tables" (aka crosstabulation), it's built on top of numpy, and has simple data IO tools.
Also check out this discussion on "pivot tables" from the numpy list a while ago: http://mail.scipy.org/pipermail/numpy-discussion/2007-August/028739.html
One question -- what do you mean by "raster data"? In my arena, that usually means "images"... and I'm not sure what a crosstabulation on image data would mean!
Zach
On Jul 14, 2010, at 2:28 PM, Ionut Sandric wrote:
Sorry, the first email was sent before to finish it...
Hi:
I have two raster data and I would like to do a crosstabulation between them and export the results to a table in a text file. Is it possible to do it with NumPy? Does someone have an example?
Thank you,
Ionut
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
2010/7/14 Ionut Sandric sandricionut@yahoo.com:
By raster data I mean classified slope gradient (derived from a dem), landuse-landcover, lithology etc. A crosstabulation analysis will give me a table with the common areas for each class from each raster and this will go into other analysis. I can do it with other softwares (like ArcGIS DEsktop etc), but I would like to have all with numpy or to build something on top of numpy
I'm afraid also Zach does not understand what you are talking about ... So my first question (please bear with me) would be: What's a dem? (n/a in my dictionary) And sorry for the cross-talk on the other first post by you ...
And by slope gradient you mean second derivative?
Friedrich
On Sat, Jul 17, 2010 at 13:11, Friedrich Romstedt friedrichromstedt@gmail.com wrote:
2010/7/14 Ionut Sandric sandricionut@yahoo.com:
By raster data I mean classified slope gradient (derived from a dem), landuse-landcover, lithology etc. A crosstabulation analysis will give me a table with the common areas for each class from each raster and this will go into other analysis. I can do it with other softwares (like ArcGIS DEsktop etc), but I would like to have all with numpy or to build something on top of numpy
I'm afraid also Zach does not understand what you are talking about ... So my first question (please bear with me) would be: What's a dem?
Digital Elevation Map.
(n/a in my dictionary) And sorry for the cross-talk on the other first post by you ...
And by slope gradient you mean second derivative?
No, the first derivative. "Slope gradient" is a reasonably common, albeit somewhat redundant, idiom meaning the gradient of an elevation map.
2010/7/17 Robert Kern robert.kern@gmail.com:
On Sat, Jul 17, 2010 at 13:11, Friedrich Romstedt friedrichromstedt@gmail.com wrote:
2010/7/14 Ionut Sandric sandricionut@yahoo.com: I'm afraid also Zach does not understand what you are talking about ... So my first question (please bear with me) would be: What's a dem?
Digital Elevation Map.
(n/a in my dictionary) And sorry for the cross-talk on the other first post by you ...
And by slope gradient you mean second derivative?
No, the first derivative. "Slope gradient" is a reasonably common, albeit somewhat redundant, idiom meaning the gradient of an elevation map.
Thanks Robert, that clarifies a lot.
But still I don't understand how the crosstabulation shall work. What are the "classes"?
Friedrich
Hi Friedrich:
For land-use a class would be for example forest, other would be orchard etc. For Slope gradient I would have values which <3 and between 3 and 7 etc. So, I will have 2 raster data with, let's say, 3 classes each: forest, orchards and built-up area and for slope gradient: 0-3, 3-15, 15-35. The cross-tabulation analysis should give me a table like:
forest orchards built-up 0-3 10 20 15 3-15 5 10 20 15-35 5 15 15
where the numbers represents all the common cells, for example: 10 cells with forest correspond to 10 cells with 0-3 slope gradient interval and so on (by cells I mean the pixel from a raster data)
The analysis is better illustrated here: http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=tabulate_area
Ionut
________________________________ From: Friedrich Romstedt friedrichromstedt@gmail.com To: Discussion of Numerical Python numpy-discussion@scipy.org Sent: Sun, July 18, 2010 12:09:04 AM Subject: Re: [Numpy-discussion] Crosstabulation
2010/7/17 Robert Kern robert.kern@gmail.com:
On Sat, Jul 17, 2010 at 13:11, Friedrich Romstedt friedrichromstedt@gmail.com wrote:
2010/7/14 Ionut Sandric sandricionut@yahoo.com: I'm afraid also Zach does not understand what you are talking about ... So my first question (please bear with me) would be: What's a dem?
Digital Elevation Map.
(n/a in my dictionary) And sorry for the cross-talk on the other first post by you ...
And by slope gradient you mean second derivative?
No, the first derivative. "Slope gradient" is a reasonably common, albeit somewhat redundant, idiom meaning the gradient of an elevation map.
Thanks Robert, that clarifies a lot.
But still I don't understand how the crosstabulation shall work. What are the "classes"?
Friedrich _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 07/19/2010 09:55 AM, sandric ionut wrote:
Hi Friedrich:
For land-use a class would be for example forest, other would be orchard etc. For Slope gradient I would have values which <3 and between 3 and 7 etc. So, I will have 2 raster data with, let's say, 3 classes each: forest, orchards and built-up area and for slope gradient: 0-3, 3-15, 15-35. The cross-tabulation analysis should give me a table like:
forest orchards built-up 0-3 10 &n bsp; 20 15 3-15 5 10 20 15-35 5 15 15
where the numbers represents all the common cells, for example: 10 cells with forest correspond to 10 cells with 0-3 slope gradient interval and so on (by cells I mean the pixel from a raster data)
The analysis is better illustrated here: http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=tabulate_area
Ionut
Ha, we're messing up lingo's here :-) Need to switch to GIS (geographic information systems) dialect. - DEM = digital elevation map, usually a 2d array ('raster') with elevation values (for a certain area on earth) - slope gradient = the slope (literally, not as in math speak) of the surface depicted by the elevation map. Mostly defined as the maximum slope within a certain moving window; several competing methods to estimate/calculate slope exist. - land use/cover class: raster (array) where each cell ('pixel') has an integer value, which maps to some well defined land use at that location (e.g. 0 means sea, 1 means forest, 2 means agriculture, etc) - crosstabulation usually means some kind of 2d histogram, where the total number of raster cells with a certain value (e.g. depicting 'land use class') 'within' a range of values of another raster with the same shape (and matching locations). Like: how many cells of forest lie withing a slope range of 0-10 degrees?
Right. On to the answers. I think you should look into numpy.histogram2d, where you can do exactly what you want. Your land use array is x, your slope gradient array = y, then you define the bins as your class numbers (for x) and your slope gradient ranges (for y), and you will get a pixel count for each bin combination.
see: http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram2d.html
Regards, Vincent Schut.
*From:* Friedrich Romstedt friedrichromstedt@gmail.com *To:* Discussion of Numerical Python numpy-discussion@scipy.org *Sent:* Sun, July 18, 2010 12:09:04 AM *Subject:* Re: [Numpy-discussion] Crosstabulation
2010/7/17 Robert Kern <robert.kern@gmail.com mailto:robert.kern@gmail.com>:
On Sat, Jul 17, 2010 at 13:11, Friedrich Romstedt <friedrichromstedt@gmail.com mailto:friedrichromstedt@gmail.com> wrote:
2010/7/14 Ionut Sandric <sandricionut@yahoo.com
mailto:sandricionut@yahoo.com>:
I'm afraid also Zach does not understand what you are talking about ... So my first question (please bear with me) would be: What's a dem?
Digital Elevation Map.
(n/a in my dictionary) And sorry for the cross-talk on the other first post by you ...
And by slope gradient you mean second derivative?
No, the first derivative. "Slope gradient" is a reasonably common, albeit somewhat redundant, idiom meaning the gradient of an elevation map.
Thanks Robert, that clarifies a lot.
But still I don't understand how the crosstabulation shall work. What are the "classes"?
Friedrich _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
2010/7/19 sandric ionut sandricionut@yahoo.com:
For land-use a class would be for example forest, other would be orchard etc. For Slope gradient I would have values which <3 and between 3 and 7 etc. So, I will have 2 raster data with, let's say, 3 classes each: forest, orchards and built-up area and for slope gradient: 0-3, 3-15, 15-35. The cross-tabulation analysis should give me a table like:
forest orchards built-up 0-3 10 20 15 3-15 5 10 20 15-35 5 15 15
where the numbers represents all the common cells, for example: 10 cells with forest correspond to 10 cells with 0-3 slope gradient interval and so on (by cells I mean the pixel from a raster data)
Okay everything is clear now. I would suggestest looping over the cells of the table. E.g. when:
landuse_catmap slope_map
are the respective maps, you can categorise the slope_map with
slope_catmap = 0 * (0 <= slope_map) * (slope_map < 3) + 1 * (3 <= slope_map) * (slope_map < 15) + 2 * (15 <= slope_map)
to get categories 0, 1, and 2, where the zero case is included here only for illustration, since it's just zero it can be omitted.
Then you are maybe supposed to do:
table = numpy.zeros((N_slope_cats, N_landuse_cats))
and finally the looping over its elements, where the looping over the map cells is done entirely by numpy:
for slope_cat in xrange(0, N_slope_cats): for landuse_cat in xrange(0, N_landuse_cats): table[slope_cat, landuse_cat] = \ ((slope_catmap == slope_cat) * \ (landuse_catmap == landuse_cat)).sum()
I believe there is some slightly faster but more obscure way doing this table creation in one single large step by putting the maps into a hyperdimensianal array but I also believe this is beyond our scope here if this is already fast enough.
Friedrich
On 07/19/2010 10:14 PM, Friedrich Romstedt wrote:
2010/7/19 sandric ionutsandricionut@yahoo.com:
For land-use a class would be for example forest, other would be orchard etc. For Slope gradient I would have values which<3 and between 3 and 7 etc. So, I will have 2 raster data with, let's say, 3 classes each: forest, orchards and built-up area and for slope gradient: 0-3, 3-15, 15-35. The cross-tabulation analysis should give me a table like:
forest orchards built-up
0-3 10 20 15 3-15 5 10 20 15-35 5 15 15
where the numbers represents all the common cells, for example: 10 cells with forest correspond to 10 cells with 0-3 slope gradient interval and so on (by cells I mean the pixel from a raster data)
Okay everything is clear now. I would suggestest looping over the cells of the table. E.g. when:
landuse_catmap slope_map
are the respective maps, you can categorise the slope_map with
slope_catmap = 0 * (0<= slope_map) * (slope_map< 3) + 1 * (3<= slope_map) * (slope_map< 15) + 2 * (15<= slope_map)
to get categories 0, 1, and 2, where the zero case is included here only for illustration, since it's just zero it can be omitted.
Then you are maybe supposed to do:
table = numpy.zeros((N_slope_cats, N_landuse_cats))
and finally the looping over its elements, where the looping over the map cells is done entirely by numpy:
for slope_cat in xrange(0, N_slope_cats): for landuse_cat in xrange(0, N_landuse_cats): table[slope_cat, landuse_cat] = \ ((slope_catmap == slope_cat) * \ (landuse_catmap == landuse_cat)).sum()
I believe there is some slightly faster but more obscure way doing this table creation in one single large step by putting the maps into a hyperdimensianal array but I also believe this is beyond our scope here if this is already fast enough.
That 'slightly faster but more obscure way' would be histogram2d :-) Especially when you have many landuse and/or slope classes over which to loop, histogram2d will be more than just slightly faster. About obscure, well, I leave that to the user. Probably depends on your background. Personally I use it all the time. Loops are evil :-)
your 'more obscure' code would become something like this:
slope_bin_edges = [0, 3, 15, 35] landuse_bin_edges = [0, 1, 2, 3] crosstab = numpy.histogram2d(landuse, slope, bins=(landuse_bin_edges, slope_bin_edges))
VS.
Friedrich
2010/7/20 Vincent Schut schut@sarvision.nl:
slope_bin_edges = [0, 3, 15, 35] landuse_bin_edges = [0, 1, 2, 3] crosstab = numpy.histogram2d(landuse, slope, bins=(landuse_bin_edges, slope_bin_edges))
I like it! I guess the actual bins are [0, 3), [3, 15) and [15, 35)?
From the docs, that is not so clear. Either way, the 35 or the 0
would be an outlier. So maybe it's better to specifiy a somewhat enlarged end boundary. (And, of course, for the integer landuse data bin edges [-0.5, 1.5, ..., 3.5] may be best.)
Friedrich
Ionut Sandric <sandricionut <at> yahoo.com> writes:
Thank you Zack:
By raster data I mean classified slope gradient (derived from a dem),
landuse-landcover, lithology etc. A
crosstabulation analysis will give me a table with the common areas for each
class from each raster and
this will go into other analysis. I can do it with other softwares (like
ArcGIS DEsktop etc), but I would
like to have all with numpy or to build something on top of numpy
Thanks's again
Ionut
----- Original Message ----- From: "Zachary Pincus" <zachary.pincus <at> yale.edu> To: "Discussion of Numerical Python" <numpy-discussion <at> scipy.org> Sent: Wednesday, July 14, 2010 9:42:49 PM GMT +02:00 Athens, Beirut,
Bucharest, Istanbul
Subject: Re: [Numpy-discussion] Crosstabulation
Hi Ionut,
Check out the "tabular" package: http://parsemydata.com/tabular/index.html
It seems to be basically what you want... it does "pivot tables" (aka crosstabulation), it's built on top of numpy, and has simple data IO tools.
Also check out this discussion on "pivot tables" from the numpy list a while ago: http://mail.scipy.org/pipermail/numpy-discussion/2007-August/028739.html
One question -- what do you mean by "raster data"? In my arena, that usually means "images"... and I'm not sure what a crosstabulation on image data would mean!
Zach
On Jul 14, 2010, at 2:28 PM, Ionut Sandric wrote:
Sorry, the first email was sent before to finish it...
Hi:
I have two raster data and I would like to do a crosstabulation between them and export the results to a table in a text file. Is it possible to do it with NumPy? Does someone have an example?
Thank you,
Ionut
NumPy-Discussion mailing list NumPy-Discussion <at> scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion <at> scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
You may be able to adapt this simple script to your case.
import numpy as np
# generate some test data def gen(b, e, m, n): return np.arange(b, e+ 1), np.random.randint(b, e+ 1, (m, n)) m, n= 15, 15 c1, d1= gen(0, 3, m, n); print d1 c2, d2= gen(3, 5, m, n); print d2
# perform actual x-tabulation xtab= np.zeros((len(c1), len(c2)), np.int) for i in xrange(len(c1)): tmp= d2[c1[i]== d1] for j in xrange(len(c2)): xtab[i, j]= np.sum(c2[j]== tmp) print xtab, np.sum(xtab)== np.prod(d1.shape)
Anyway it's straightforward to extend it to nd x-tabulations ;-).
My 2 cents, eat