
I am very new to Numpy and relatively new to Python. I have used Matlab for 15+ years now. But, I am starting to lean toward using Numpy for all my work.
One thing that I am not understanding is the order of data when read in from a file. Let's say I have a 256x256x150 uint16 dataset (MRI, 150 slices). In Matlab I would read it in as:
fp=fopen(<binary file>); d = fread(fp,256*256*150, 'int16'); fclose(fp); c = reshape(d, [256 256 150]); imagesc(c(:,:,1));
I am very used to having it read it in and doing the reshaping such that it is 256 rows by 256 columns by 150 slices.
Now, Numpy, I can read in the binary data using fromfile (or open, read, close):
In [85]: a = fromfile(<binary file>, dtype='int16') In [86]: b = array(struct.unpack('<%dH'%(256*256*150), a)).reshape(150,256,256)
So, in Numpy I have to reshape it so the "slices" are in the first dimension. Obviously, I can do a b.transpose( (1,2,0) ) to get it to look like Matlab, but...
I don't understand why the index ordering is different between Matlab and Numpy. (It isn't a C/Fortran ordering thing, I don' think).
Is the data access faster if I have b without the tranpose, or can I transpose it so it "looks" like Matlab without taking a hit when I do imshow( b[:,:,0] ).
Any help for a Numpy newbie would be appreciated.

On Tue, May 12, 2009 at 2:51 PM, brechmos craig@brechmos.org wrote:
So, in Numpy I have to reshape it so the "slices" are in the first dimension. Obviously, I can do a b.transpose( (1,2,0) ) to get it to look like Matlab, but...
I don't understand why the index ordering is different between Matlab and Numpy. (It isn't a C/Fortran ordering thing, I don' think).
Actually, that's precisely the reason.
Is the data access faster if I have b without the tranpose, or can I transpose it so it "looks" like Matlab without taking a hit when I do imshow( b[:,:,0] ).
It's going to be faster to do it without the transpose. Besides, for numpy, that imshow becomes:
imshow(b[0])
Which, IMHO, looks better than Matlab.
Ryan

Ah, hah.
In [3]: c = b.reshape((256,256,150), order='F')
Ok, I needed more coffee.
If I do it this way (without the transpose), it should be as fast as c=b.reshape((150,256,256)), right? It is just changing the stride (or something like that)? Or is it going to be faster without changing the order?
Thanks for the help.
Ryan May-3 wrote:
On Tue, May 12, 2009 at 2:51 PM, brechmos craig@brechmos.org wrote:
So, in Numpy I have to reshape it so the "slices" are in the first dimension. Obviously, I can do a b.transpose( (1,2,0) ) to get it to look like Matlab, but...
I don't understand why the index ordering is different between Matlab and Numpy. (It isn't a C/Fortran ordering thing, I don' think).
Actually, that's precisely the reason.
Is the data access faster if I have b without the tranpose, or can I transpose it so it "looks" like Matlab without taking a hit when I do imshow( b[:,:,0] ).
It's going to be faster to do it without the transpose. Besides, for numpy, that imshow becomes:
imshow(b[0])
Which, IMHO, looks better than Matlab.
Ryan
-- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma
Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Tue, May 12, 2009 at 14:55, Ryan May rmay31@gmail.com wrote:
On Tue, May 12, 2009 at 2:51 PM, brechmos craig@brechmos.org wrote:
So, in Numpy I have to reshape it so the "slices" are in the first dimension. Obviously, I can do a b.transpose( (1,2,0) ) to get it to look like Matlab, but...
I don't understand why the index ordering is different between Matlab and Numpy. (It isn't a C/Fortran ordering thing, I don' think).
Actually, that's precisely the reason.
To expand on this comment, when Matlab was first released, it was basically just an interactive shell on top of FORTRAN routines from LAPACK and other linear algebra *PACKs. Consequently, it standardized on FORTRAN's column-major format.
While numpy isn't really beholden to C's ordering for multidimensional arrays (numpy arrays are just blocks of strided memory, not x[i][j][k] arrays of pointers to arrays of pointers to arrays), we do want consistency with the equivalent nested Python lists, and that does imply row-major formatting by default.

On 12-May-09, at 3:55 PM, Ryan May wrote:
It's going to be faster to do it without the transpose. Besides, for numpy, that imshow becomes:
imshow(b[0])
Which, IMHO, looks better than Matlab.
You're right, that is better, odd how I never thought of doing it like that. I've been stuck in my Matlab-esque world with dstack() as my default mental model of how images/matrices ought to be stacked.
Am I right in thinking that b[0] is stored in a big contiguous block of memory, thus making the read marginally faster than slicing on the third?
David

This is interesting.
I have always done RGB imaging with numpy using arrays of shape (height, width, 3). In fact, this is the form that PIL gives when calling np.asarray() on a PIL image.
It does seem more efficient to be able to do a[0],a[1],a[2] to get the R, G, and B channels respectively. This, obviously is not currently the case.
Would it be better for me to switch to this way of doing things and/or work a patch for PIL so that the array is built in the form (3, height, width)?
Chris
On Tue, May 12, 2009 at 4:14 PM, David Warde-Farley dwf@cs.toronto.eduwrote:
On 12-May-09, at 3:55 PM, Ryan May wrote:
It's going to be faster to do it without the transpose. Besides, for numpy, that imshow becomes:
imshow(b[0])
Which, IMHO, looks better than Matlab.
You're right, that is better, odd how I never thought of doing it like that. I've been stuck in my Matlab-esque world with dstack() as my default mental model of how images/matrices ought to be stacked.
Am I right in thinking that b[0] is stored in a big contiguous block of memory, thus making the read marginally faster than slicing on the third?
David _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Tue, May 12, 2009 at 15:32, Chris Colbert sccolbert@gmail.com wrote:
This is interesting.
I have always done RGB imaging with numpy using arrays of shape (height, width, 3). In fact, this is the form that PIL gives when calling np.asarray() on a PIL image.
It does seem more efficient to be able to do a[0],a[1],a[2] to get the R, G, and B channels respectively. This, obviously is not currently the case.
It's not *that* much more efficient.
Would it be better for me to switch to this way of doing things and/or work a patch for PIL so that the array is built in the form (3, height, width)?
Submitting a patch for PIL would neither be successful, nor worth your time. Not to mention breaking existing code.
participants (5)
-
brechmos
-
Chris Colbert
-
David Warde-Farley
-
Robert Kern
-
Ryan May