Re: [Numpy-discussion] Crosstabulation
Thank you Zack:
By raster data I mean classified slope gradient (derived from a dem), landuse-landcover, lithology etc. A crosstabulation analysis will give me a table with the common areas for each class from each raster and this will go into other analysis. I can do it with other softwares (like ArcGIS DEsktop etc), but I would like to have all with numpy or to build something on top of numpy
Thanks's again
Ionut
----- Original Message -----
From: "Zachary Pincus"
Sorry, the first email was sent before to finish it...
Hi:
I have two raster data and I would like to do a crosstabulation between them and export the results to a table in a text file. Is it possible to do it with NumPy? Does someone have an example?
Thank you,
Ionut
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
2010/7/14 Ionut Sandric
By raster data I mean classified slope gradient (derived from a dem), landuse-landcover, lithology etc. A crosstabulation analysis will give me a table with the common areas for each class from each raster and this will go into other analysis. I can do it with other softwares (like ArcGIS DEsktop etc), but I would like to have all with numpy or to build something on top of numpy
I'm afraid also Zach does not understand what you are talking about ... So my first question (please bear with me) would be: What's a dem? (n/a in my dictionary) And sorry for the cross-talk on the other first post by you ... And by slope gradient you mean second derivative? Friedrich
On Sat, Jul 17, 2010 at 13:11, Friedrich Romstedt
2010/7/14 Ionut Sandric
: By raster data I mean classified slope gradient (derived from a dem), landuse-landcover, lithology etc. A crosstabulation analysis will give me a table with the common areas for each class from each raster and this will go into other analysis. I can do it with other softwares (like ArcGIS DEsktop etc), but I would like to have all with numpy or to build something on top of numpy
I'm afraid also Zach does not understand what you are talking about ... So my first question (please bear with me) would be: What's a dem?
Digital Elevation Map.
(n/a in my dictionary) And sorry for the cross-talk on the other first post by you ...
And by slope gradient you mean second derivative?
No, the first derivative. "Slope gradient" is a reasonably common, albeit somewhat redundant, idiom meaning the gradient of an elevation map. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
2010/7/17 Robert Kern
On Sat, Jul 17, 2010 at 13:11, Friedrich Romstedt
wrote: 2010/7/14 Ionut Sandric
: I'm afraid also Zach does not understand what you are talking about ... So my first question (please bear with me) would be: What's a dem? Digital Elevation Map.
(n/a in my dictionary) And sorry for the cross-talk on the other first post by you ...
And by slope gradient you mean second derivative?
No, the first derivative. "Slope gradient" is a reasonably common, albeit somewhat redundant, idiom meaning the gradient of an elevation map.
Thanks Robert, that clarifies a lot. But still I don't understand how the crosstabulation shall work. What are the "classes"? Friedrich
Hi Friedrich:
For land-use a class would be for example forest, other would be orchard etc.
For Slope gradient I would have values which <3 and between 3 and 7 etc. So, I
will have 2 raster data with, let's say, 3 classes each: forest, orchards and
built-up area and for slope gradient: 0-3, 3-15, 15-35. The cross-tabulation
analysis should give me a table like:
forest orchards built-up
0-3 10 20 15
3-15 5 10 20
15-35 5 15 15
where the numbers represents all the common cells, for example: 10 cells with
forest correspond to 10 cells with 0-3 slope gradient interval and so on
(by cells I mean the pixel from a raster data)
The analysis is better illustrated here:
http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=tabulate_area
Ionut
________________________________
From: Friedrich Romstedt
On Sat, Jul 17, 2010 at 13:11, Friedrich Romstedt
wrote: 2010/7/14 Ionut Sandric
: I'm afraid also Zach does not understand what you are talking about ... So my first question (please bear with me) would be: What's a dem? Digital Elevation Map.
(n/a in my dictionary) And sorry for the cross-talk on the other first post by you ...
And by slope gradient you mean second derivative?
No, the first derivative. "Slope gradient" is a reasonably common, albeit somewhat redundant, idiom meaning the gradient of an elevation map.
Thanks Robert, that clarifies a lot. But still I don't understand how the crosstabulation shall work. What are the "classes"? Friedrich _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 07/19/2010 09:55 AM, sandric ionut wrote:
Hi Friedrich:
For land-use a class would be for example forest, other would be orchard etc. For Slope gradient I would have values which <3 and between 3 and 7 etc. So, I will have 2 raster data with, let's say, 3 classes each: forest, orchards and built-up area and for slope gradient: 0-3, 3-15, 15-35. The cross-tabulation analysis should give me a table like:
forest orchards built-up 0-3 10 &n bsp; 20 15 3-15 5 10 20 15-35 5 15 15
where the numbers represents all the common cells, for example: 10 cells with forest correspond to 10 cells with 0-3 slope gradient interval and so on (by cells I mean the pixel from a raster data)
The analysis is better illustrated here: http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=tabulate_area
Ionut
Ha, we're messing up lingo's here :-) Need to switch to GIS (geographic information systems) dialect. - DEM = digital elevation map, usually a 2d array ('raster') with elevation values (for a certain area on earth) - slope gradient = the slope (literally, not as in math speak) of the surface depicted by the elevation map. Mostly defined as the maximum slope within a certain moving window; several competing methods to estimate/calculate slope exist. - land use/cover class: raster (array) where each cell ('pixel') has an integer value, which maps to some well defined land use at that location (e.g. 0 means sea, 1 means forest, 2 means agriculture, etc) - crosstabulation usually means some kind of 2d histogram, where the total number of raster cells with a certain value (e.g. depicting 'land use class') 'within' a range of values of another raster with the same shape (and matching locations). Like: how many cells of forest lie withing a slope range of 0-10 degrees? Right. On to the answers. I think you should look into numpy.histogram2d, where you can do exactly what you want. Your land use array is x, your slope gradient array = y, then you define the bins as your class numbers (for x) and your slope gradient ranges (for y), and you will get a pixel count for each bin combination. see: http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram2d.html Regards, Vincent Schut.
------------------------------------------------------------------------ *From:* Friedrich Romstedt
*To:* Discussion of Numerical Python *Sent:* Sun, July 18, 2010 12:09:04 AM *Subject:* Re: [Numpy-discussion] Crosstabulation 2010/7/17 Robert Kern
mailto:robert.kern@gmail.com>: On Sat, Jul 17, 2010 at 13:11, Friedrich Romstedt
mailto:friedrichromstedt@gmail.com> wrote: 2010/7/14 Ionut Sandric
mailto:sandricionut@yahoo.com>: I'm afraid also Zach does not understand what you are talking about ... So my first question (please bear with me) would be: What's a dem? Digital Elevation Map.
(n/a in my dictionary) And sorry for the cross-talk on the other first post by you ...
And by slope gradient you mean second derivative?
No, the first derivative. "Slope gradient" is a reasonably common, albeit somewhat redundant, idiom meaning the gradient of an elevation map.
Thanks Robert, that clarifies a lot.
But still I don't understand how the crosstabulation shall work. What are the "classes"?
Friedrich _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
2010/7/19 sandric ionut
For land-use a class would be for example forest, other would be orchard etc. For Slope gradient I would have values which <3 and between 3 and 7 etc. So, I will have 2 raster data with, let's say, 3 classes each: forest, orchards and built-up area and for slope gradient: 0-3, 3-15, 15-35. The cross-tabulation analysis should give me a table like:
forest orchards built-up 0-3 10 20 15 3-15 5 10 20 15-35 5 15 15
where the numbers represents all the common cells, for example: 10 cells with forest correspond to 10 cells with 0-3 slope gradient interval and so on (by cells I mean the pixel from a raster data)
Okay everything is clear now. I would suggestest looping over the cells of the table. E.g. when: landuse_catmap slope_map are the respective maps, you can categorise the slope_map with slope_catmap = 0 * (0 <= slope_map) * (slope_map < 3) + 1 * (3 <= slope_map) * (slope_map < 15) + 2 * (15 <= slope_map) to get categories 0, 1, and 2, where the zero case is included here only for illustration, since it's just zero it can be omitted. Then you are maybe supposed to do: table = numpy.zeros((N_slope_cats, N_landuse_cats)) and finally the looping over its elements, where the looping over the map cells is done entirely by numpy: for slope_cat in xrange(0, N_slope_cats): for landuse_cat in xrange(0, N_landuse_cats): table[slope_cat, landuse_cat] = \ ((slope_catmap == slope_cat) * \ (landuse_catmap == landuse_cat)).sum() I believe there is some slightly faster but more obscure way doing this table creation in one single large step by putting the maps into a hyperdimensianal array but I also believe this is beyond our scope here if this is already fast enough. Friedrich
On 07/19/2010 10:14 PM, Friedrich Romstedt wrote:
2010/7/19 sandric ionut
: For land-use a class would be for example forest, other would be orchard etc. For Slope gradient I would have values which<3 and between 3 and 7 etc. So, I will have 2 raster data with, let's say, 3 classes each: forest, orchards and built-up area and for slope gradient: 0-3, 3-15, 15-35. The cross-tabulation analysis should give me a table like:
forest orchards built-up 0-3 10 20 15 3-15 5 10 20 15-35 5 15 15
where the numbers represents all the common cells, for example: 10 cells with forest correspond to 10 cells with 0-3 slope gradient interval and so on (by cells I mean the pixel from a raster data)
Okay everything is clear now. I would suggestest looping over the cells of the table. E.g. when:
landuse_catmap slope_map
are the respective maps, you can categorise the slope_map with
slope_catmap = 0 * (0<= slope_map) * (slope_map< 3) + 1 * (3<= slope_map) * (slope_map< 15) + 2 * (15<= slope_map)
to get categories 0, 1, and 2, where the zero case is included here only for illustration, since it's just zero it can be omitted.
Then you are maybe supposed to do:
table = numpy.zeros((N_slope_cats, N_landuse_cats))
and finally the looping over its elements, where the looping over the map cells is done entirely by numpy:
for slope_cat in xrange(0, N_slope_cats): for landuse_cat in xrange(0, N_landuse_cats): table[slope_cat, landuse_cat] = \ ((slope_catmap == slope_cat) * \ (landuse_catmap == landuse_cat)).sum()
I believe there is some slightly faster but more obscure way doing this table creation in one single large step by putting the maps into a hyperdimensianal array but I also believe this is beyond our scope here if this is already fast enough.
That 'slightly faster but more obscure way' would be histogram2d :-) Especially when you have many landuse and/or slope classes over which to loop, histogram2d will be more than just slightly faster. About obscure, well, I leave that to the user. Probably depends on your background. Personally I use it all the time. Loops are evil :-) your 'more obscure' code would become something like this: slope_bin_edges = [0, 3, 15, 35] landuse_bin_edges = [0, 1, 2, 3] crosstab = numpy.histogram2d(landuse, slope, bins=(landuse_bin_edges, slope_bin_edges)) VS.
Friedrich
2010/7/20 Vincent Schut
slope_bin_edges = [0, 3, 15, 35] landuse_bin_edges = [0, 1, 2, 3] crosstab = numpy.histogram2d(landuse, slope, bins=(landuse_bin_edges, slope_bin_edges))
I like it! I guess the actual bins are [0, 3), [3, 15) and [15, 35)?
From the docs, that is not so clear. Either way, the 35 or the 0 would be an outlier. So maybe it's better to specifiy a somewhat enlarged end boundary. (And, of course, for the integer landuse data bin edges [-0.5, 1.5, ..., 3.5] may be best.)
Friedrich
Ionut Sandric
Thank you Zack:
By raster data I mean classified slope gradient (derived from a dem), landuse-landcover, lithology etc. A crosstabulation analysis will give me a table with the common areas for each class from each raster and this will go into other analysis. I can do it with other softwares (like ArcGIS DEsktop etc), but I would like to have all with numpy or to build something on top of numpy
Thanks's again
Ionut
----- Original Message ----- From: "Zachary Pincus"
To: "Discussion of Numerical Python" Sent: Wednesday, July 14, 2010 9:42:49 PM GMT +02:00 Athens, Beirut, Bucharest, Istanbul Subject: Re: [Numpy-discussion] Crosstabulation Hi Ionut,
Check out the "tabular" package: http://parsemydata.com/tabular/index.html
It seems to be basically what you want... it does "pivot tables" (aka crosstabulation), it's built on top of numpy, and has simple data IO tools.
Also check out this discussion on "pivot tables" from the numpy list a while ago: http://mail.scipy.org/pipermail/numpy-discussion/2007-August/028739.html
One question -- what do you mean by "raster data"? In my arena, that usually means "images"... and I'm not sure what a crosstabulation on image data would mean!
Zach
On Jul 14, 2010, at 2:28 PM, Ionut Sandric wrote:
Sorry, the first email was sent before to finish it...
Hi:
I have two raster data and I would like to do a crosstabulation between them and export the results to a table in a text file. Is it possible to do it with NumPy? Does someone have an example?
Thank you,
Ionut
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion <at> scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion <at> scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi, You may be able to adapt this simple script to your case. import numpy as np # generate some test data def gen(b, e, m, n): return np.arange(b, e+ 1), np.random.randint(b, e+ 1, (m, n)) m, n= 15, 15 c1, d1= gen(0, 3, m, n); print d1 c2, d2= gen(3, 5, m, n); print d2 # perform actual x-tabulation xtab= np.zeros((len(c1), len(c2)), np.int) for i in xrange(len(c1)): tmp= d2[c1[i]== d1] for j in xrange(len(c2)): xtab[i, j]= np.sum(c2[j]== tmp) print xtab, np.sum(xtab)== np.prod(d1.shape) Anyway it's straightforward to extend it to nd x-tabulations ;-). My 2 cents, eat
participants (6)
-
eat
-
Friedrich Romstedt
-
Ionut Sandric
-
Robert Kern
-
sandric ionut
-
Vincent Schut