Hi all. Sorry if this question has already been asked. I've searched the archive, but could not find anything related, so here is my question. I'm using np.histogram on a 4000x4000 array, each with 200 bins. I do that on both dimensions, meaning I compute 8000 histograms. It takes around 5 seconds (which is of course quite fast). I was wondering why np.histogram does not accept an axis parameter so that it could work directly on the array without me having to write a loop. Or maybe did I miss some parameters using np.histogram. Thanks. Éric. Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne eric@depagne.org
Hi,
On Tue, Mar 29, 2011 at 4:29 PM, Éric Depagne
Hi all.
Sorry if this question has already been asked. I've searched the archive, but could not find anything related, so here is my question.
I'm using np.histogram on a 4000x4000 array, each with 200 bins. I do that on both dimensions, meaning I compute 8000 histograms. It takes around 5 seconds (which is of course quite fast).
I was wondering why np.histogram does not accept an axis parameter so that it could work directly on the array without me having to write a loop.
Or maybe did I miss some parameters using np.histogram.
FWIW, have you considered to use http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html#n... Regards, eat
Thanks.
Éric.
Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne eric@depagne.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
FWIW, have you considered to use http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html# numpy.histogramdd
Regards, eat
I tried, but I get a /usr/lib/pymodules/python2.6/numpy/lib/function_base.pyc in histogramdd(sample, bins, range, normed, weights) 370 # Reshape is used so that overlarge arrays 371 # will raise an error. --> 372 hist = zeros(nbin, float).reshape(-1) 373 374 # Compute the sample indices in the flattened histogram matrix. ValueError: sequence too large; must be smaller than 32 so I suspect my array is too big for histogramdd Éric. -- Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne eric@depagne.org
Hi,
On Tue, Mar 29, 2011 at 5:13 PM, Éric Depagne
FWIW, have you considered to use
http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogramdd.html#
numpy.histogramdd
Regards, eat
I tried, but I get a /usr/lib/pymodules/python2.6/numpy/lib/function_base.pyc in histogramdd(sample, bins, range, normed, weights) 370 # Reshape is used so that overlarge arrays
371 # will raise an error.
--> 372 hist = zeros(nbin, float).reshape(-1) 373 374 # Compute the sample indices in the flattened histogram matrix.
ValueError: sequence too large; must be smaller than 32
so I suspect my array is too big for histogramdd
So it seems that you give your array directly to histogramdd (asking a 4000D histogram!). Surely that's not what you are trying to achieve. Can you elaborate more on your objectives? Perhaps some code (slow but working) to demonstrate the point. Regards, eat
Éric. -- Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne eric@depagne.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi. Sorry for not having been clearer. I'll explain a little bit. I have 4k x 4k images that I want to analyse. I turn them into numpy arrays so I have 4k x 4k np.array. My analysis starts with determining the bias level. To do that, I compute for each line, and then for each row, an histogram. So I compute 8000 histograms. Here is the code I've used sofar: for i in range(self.data.shape[0]): #Compute an histogram along the columns # Gets counts and bounds self.countsC[i], self.boundsC[i] = np.histogram(data[i], bins=self.bins) for i in range(self.data.shape[1]): # Do the same, along the rows. self.countsR[i], self.boundsR[i] = np.histogram(data[:,i], bins=self.bins) And data.shape is (4000,4000). If histogram had an axis parameter, I could avoid the loop and I guess it would be faster. Éric.
So it seems that you give your array directly to histogramdd (asking a 4000D histogram!). Surely that's not what you are trying to achieve. Can you elaborate more on your objectives? Perhaps some code (slow but working) to demonstrate the point.
Regards, eat
Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne eric@depagne.org
How about something like this:
# numpy 1.6
def rowhist(A, bins=100):
assert (bins > 0)
assert isinstance(bins, int)
rownum = np.arange(A.shape[0]).reshape((-1, 1)).astype(int) * bins
intA = (bins * (A - A.min()) / float(A.max() - A.min())).astype(int)
intA[intA == bins] = bins - 1
return np.bincount((intA + rownum).flatten(),
minlength=(A.shape[0]).reshape((A.shape[0], bins))
# numpy 1.5
def rowhist(A, bins=100):
assert (bins > 0)
assert isinstance(bins, int)
rownum = np.arange(A.shape[0]).reshape((-1, 1)).astype(int) * bins
intA = (bins * (A - A.min()) / float(A.max() - A.min())).astype(int)
intA[intA == bins] = bins - 1
counts = np.zeros(A.shape[0] * bins)
bc = np.bincount((intA + rownum).flatten())
counts[:len(bc)] = bc
return counts.reshape((A.shape[0], bins))
On Wed, Mar 30, 2011 at 09:04, Éric Depagne
Hi.
Sorry for not having been clearer. I'll explain a little bit.
I have 4k x 4k images that I want to analyse. I turn them into numpy arrays so I have 4k x 4k np.array.
My analysis starts with determining the bias level. To do that, I compute for each line, and then for each row, an histogram. So I compute 8000 histograms.
Here is the code I've used sofar:
for i in range(self.data.shape[0]): #Compute an histogram along the columns # Gets counts and bounds self.countsC[i], self.boundsC[i] = np.histogram(data[i], bins=self.bins) for i in range(self.data.shape[1]): # Do the same, along the rows. self.countsR[i], self.boundsR[i] = np.histogram(data[:,i], bins=self.bins)
And data.shape is (4000,4000).
If histogram had an axis parameter, I could avoid the loop and I guess it would be faster.
Éric.
So it seems that you give your array directly to histogramdd (asking a 4000D histogram!). Surely that's not what you are trying to achieve. Can you elaborate more on your objectives? Perhaps some code (slow but working) to demonstrate the point.
Regards, eat
Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne eric@depagne.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Wed, Mar 30, 2011 at 10:04 AM, Éric Depagne
Hi.
Sorry for not having been clearer. I'll explain a little bit.
I have 4k x 4k images that I want to analyse. I turn them into numpy arrays so I have 4k x 4k np.array.
My analysis starts with determining the bias level. To do that, I compute for each line, and then for each row, an histogram. So I compute 8000 histograms.
Here is the code I've used sofar:
for i in range(self.data.shape[0]): #Compute an histogram along the columns # Gets counts and bounds self.countsC[i], self.boundsC[i] = np.histogram(data[i], bins=self.bins) for i in range(self.data.shape[1]): # Do the same, along the rows. self.countsR[i], self.boundsR[i] = np.histogram(data[:,i], bins=self.bins)
And data.shape is (4000,4000).
If histogram had an axis parameter, I could avoid the loop and I guess it would be faster.
Well I guess, for a slight performance improvement, you could create your own streamlined histogrammer. But, in order to better grasp your situation it would be beneficial to know how the counts and bounds are used later on. Just wondering if this kind massive histogramming could be somehow avoided totally. Regards, eat
Éric.
So it seems that you give your array directly to histogramdd (asking a 4000D histogram!). Surely that's not what you are trying to achieve. Can you elaborate more on your objectives? Perhaps some code (slow but working) to demonstrate the point.
Regards, eat
Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne eric@depagne.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Well I guess, for a slight performance improvement, you could create your own streamlined histogrammer.
But, in order to better grasp your situation it would be beneficial to know how the counts and bounds are used later on. Just wondering if this kind massive histogramming could be somehow avoided totally.
Indeed. Here's what I do. My images come from CCD, and as such, the zero level in the image is not the true zero level, but is the true zero + the background noise of each pixels. By doing the histogram, I plan on detecting what is the most common value per row. Once I have the most common value, I can derive the interval where most of the values are (the index of the largest occurence is easily obtained by sorting the counts, and I take a slice [index_max_count,index_max_count+1] in the second array given by the histogram). Then, I take the mean value of this interval and I assume it is the value of the bias for my row. I do this procedure both on the row and columns as a sanity check. And I know this procedure will not work if on any row/column there is a lot of signal and very little bias. I'll fix that afterwards ;-) Éric.
Regards, eat
Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne eric@depagne.org
Hi,
On Wed, Mar 30, 2011 at 1:44 PM, Éric Depagne
Well I guess, for a slight performance improvement, you could create your own streamlined histogrammer.
But, in order to better grasp your situation it would be beneficial to
how the counts and bounds are used later on. Just wondering if this kind massive histogramming could be somehow avoided totally. Indeed. Here's what I do. My images come from CCD, and as such, the zero level in the image is not
know the true zero level, but is the true zero + the background noise of each pixels. By doing the histogram, I plan on detecting what is the most common value per row. Once I have the most common value, I can derive the interval where most of the values are (the index of the largest occurence is easily obtained by sorting the counts, and I take a slice [index_max_count,index_max_count+1] in the second array given by the histogram). Then, I take the mean value of this interval and I assume it is the value of the bias for my row.
I do this procedure both on the row and columns as a sanity check. And I know this procedure will not work if on any row/column there is a lot of signal and very little bias. I'll fix that afterwards ;-)
Perhaps something along these lines will help you: from numpy import histogram, linspace, r_ def hist2(a, nob): bins= linspace(a.min(), a.max(), nob+ 1) d= linspace(0, bins[-1]* a.shape[0], a.shape[0])[:, None] b= (a+ d).ravel() bbins= (bins[:-1]+ d).ravel() bbins= r_[bbins, bbins[-1]+ 1] counts, _= histogram(b, bbins) return counts.reshape(-1, nob), bins It has two disadvantages 1) needs more memory and 2) "global" bins (which although should be quite straightforward to enhance if needed). Regards, eat
Éric.
Regards, eat
Un clavier azerty en vaut deux ---------------------------------------------------------- Éric Depagne eric@depagne.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (3)
-
eat
-
Thouis (Ray) Jones
-
Éric Depagne