Mailman 3 Numpy performance vs Matlab. - NumPy-Discussion

Numpy performance vs Matlab.

Nicolas ROUX

7 Jan 2009 7 Jan '09

9:19 a.m.

Hi, I need help ;-) I have here a testcase which works much faster in Matlab than Numpy. The following code takes less than 0.9sec in Matlab, but 21sec in Python. Numpy is 24 times slower than Matlab ! The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, but I have to demonstrate that this kind of Python Code is executed with the same performance than Matlab, without writing C extension. This is becoming a critical point for us. This is a testcase that people would like to see working without any code restructuring. The reasons are: - this way of writing is fairly natural. - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code) ...So I really need to use the code below, without restructuring. Numpy/Python code: ##################################################################### import numpy import time print "Start test \n" dim = 3000 a = numpy.zeros((dim,dim,3)) start = time.clock() for i in range(dim): for j in range(dim): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2] end = time.clock() - start print "Test done, %f sec" % end ##################################################################### Matlab code: ##################################################################### 'Start test' dim = 3000; tic; a =zeros(dim,dim,3); for i = 1:dim for j = 1:dim a(i,j,1) = a(i,j,2); a(i,j,2) = a(i,j,1); a(i,j,3) = a(i,j,3); end end toc 'Test done' ##################################################################### Any idea on it ? Did I missed something ? Thanks a lot, in advance for your help. Cheers, Nicolas.

Show replies by date

David Cournapeau

7 Jan 7 Jan

9:16 a.m.

Nicolas ROUX wrote:

...

Hi,

I need help ;-) I have here a testcase which works much faster in Matlab than Numpy.

The following code takes less than 0.9sec in Matlab, but 21sec in Python. Numpy is 24 times slower than Matlab ! The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, but I have to demonstrate that this kind of Python Code is executed with the same performance than Matlab, without writing C extension. This is becoming a critical point for us.

This is a testcase that people would like to see working without any code restructuring. The reasons are: - this way of writing is fairly natural. - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code)

...So I really need to use the code below, without restructuring.

Numpy/Python code: ##################################################################### import numpy import time

print "Start test \n"

dim = 3000

a = numpy.zeros((dim,dim,3))

start = time.clock()

for i in range(dim): for j in range(dim): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2]

end = time.clock() - start

print "Test done, %f sec" % end #####################################################################

Matlab code: ##################################################################### 'Start test' dim = 3000; tic; a =zeros(dim,dim,3); for i = 1:dim for j = 1:dim a(i,j,1) = a(i,j,2); a(i,j,2) = a(i,j,1); a(i,j,3) = a(i,j,3); end end toc 'Test done' #####################################################################

Any idea on it ? Did I missed something ?

I think on recent versions of matlab, there is nothing you can do without modifying the code: matlab has some JIT compilation for loops, which is supposed to speed up those cases - at least, that's what is claimed by matlab. The above loops are typical examples where this should work reasonably well I believe: http://www.mathworks.com/access/helpdesk_r13/help/techdoc/matlab_prog/ch7_pe... If you really have to use loops, then matlab will be faster. But maybe you don't; can you show us a more typical example ? cheers, David

Christopher Barker

11:51 a.m.

...

Nicolas ROUX wrote:

...
The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib,

we like that!

...

...
This is a testcase that people would like to see working without any code restructuring. The reasons are: - this way of writing is fairly natural.

Only if you haven't wrapped your brain around array-oriented programming! (see below)

...

...
- the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code)

so you're asking: "how can I make this code faster without changing it?" The only way to do that is to change python or numpy, and while it might be nice to do that to improve performance in this type of case, it's a tall order! It's really not a good goal, anyway -- python/numpy is by no means a drop-in replacement for MATLAB -- they are very different beasts. Personally, I think most of the differences favor Python, but if you try to write python the same way you'd write MATLAB, you'll lose most of the benefits -- you might as well stick with MATLAB. However, in this case, MATLAB was traditionally slow with loops and indexing and needed to be vectorized for decent performance as well. It look like they now have a nice JIT compiler for this sort of thing -- to get a similar effect in numpy, you'll need to use weave or Cython or something, notable not as easy as having the interpreter just do it for you. I'd love to see a numpy-aware psyco some day, an maybe the new buffer interface will facilitate that, but it's inherently harder with numpy -- MATLAB at least used to be limited to 2-d arrays of doubles, so far less special casing to be done. Even with this nifty JIT, I think Python has many advantages -- if your code is well written, there will be a only a few places with these sorts of performance bottlenecks, and weave or Cython, or SWIG, or Ctypes, or f2py can all give you a good solution. One other thought -- could numexp help here? About array-oriented programming: All lot of folks seem to think that the only reason to "vectorize" code in MATLAB, numpy, etc, is for better performance. If MATLAB now has a good JIT, then there is no point -- I think that's a mistake. If you write your code to work with arrays of data, you get more compact, less bug-prone code than if you are working with indexed elements all the time. I also think the code is clearer most of the time. I say most, because sometimes you do need to do "tricks" to vectorize that can obfuscate the code. I understand that this may be a simplified example, and the real use-case could be quite different. However:

...

...
a = numpy.zeros((dim,dim,3))

so we essentially have three square arrays stacked together -- what do they represent? that might help guide you, but without that, I can still see:

...

...
for i in range(dim): for j in range(dim):

this really means -- for every element of the 2-d arrays, which can be written as: a[:,:]

...

...
a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2]

and this is simply swapping the three around. So, if you start out thinking in terms of a set of 2-d arrays, rather than a huge pile of elements, the code you will arrive at is more like: a[:,:,0] = a[:,:,1] a[:,:,2] = a[:,:,0] a[:,:,1] = a[:,:,2] With no loops: or you could give them names: a0 = a[:,:,0] a1 = a[:,:,1] a2 = a[:,:,2] then: a0[:] = a1 a2[:] = a0 a1[:] = a2 which, of course, is really: a[:,:,:] = a1.reshape((dim,dim,1)) but I suspect that that's the result of a typo. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Francesc Alted

12:31 p.m.

A Wednesday 07 January 2009, Christopher Barker escrigué: [clip]

...

Even with this nifty JIT, I think Python has many advantages -- if your code is well written, there will be a only a few places with these sorts of performance bottlenecks, and weave or Cython, or SWIG, or Ctypes, or f2py can all give you a good solution.

Agreed. Specially Cython, with the latests improvements for supporting optimized NumPy indexing: http://wiki.cython.org/tutorials/numpy would make these loops to work much faster.

...

One other thought -- could numexp help here?

I don't think so. Numexpr is for computing expresions like 'a-b**3-c' element-wise (a, b and c are arrays) quickly. The main reason of its high performance is that it avoids temporary copies of intermediate results. In order to use it, you need to vectorize first your loops, and this is not what Nicolas wants. Cheers, -- Francesc Alted

Sturla Molden

12:50 p.m.

On 1/7/2009 6:51 PM, Christopher Barker wrote:

...

Even with this nifty JIT,

It is not a very nifty JIT. It can transform some simple loops into vectorized expressions. And it removes the overhead from indexing with doubles. But if you are among those that do n = length(x) m = 0 for i = 1.0 : n m = m + x(i) end m = m / n instead of m = mean(x) it will be nifty enough.

...

All lot of folks seem to think that the only reason to "vectorize" code in MATLAB, numpy, etc, is for better performance. If MATLAB now has a good JIT, then there is no point -- I think that's a mistake.

Fortran 90/95 has array slicing as well. Sturla Molden

Sturla Molden

12:16 p.m.

On 1/7/2009 4:16 PM, David Cournapeau wrote:

...

I think on recent versions of matlab, there is nothing you can do without modifying the code: matlab has some JIT compilation for loops, which is supposed to speed up those cases - at least, that's what is claimed by matlab.

Yes it does. After using both for more than 10 years, my impression is this: - Matlab slicing creates new arrays. NumPy slicing creates views. NumPy is faster and more memory efficient. - Matlab JIT compiles loops. NumPy does not. Matlab is faster for stupid programmers that don't know how use slices. But neither Matlab nor Python/NumPy is meant to be used like Java. - Python has psyco. It is about as good as Matlab's JIT. But psyco has no knowledge of NumPy ndarrays. - Using Cython is easier than writing Matlab MEX files. - Python has better support for data structures, better built-in structures (tuple, lists, dics, sets), and general purpose libraries. Matlab has extensive numerical toolboxes that you can buy. - Matlab pass function arguments by value (albeit COW optimized). Python pass references. This makes NumPy more efficient if you need to pass large arrays or array slices. - Matlab tends to fragment the heap (hence the pack command). Python/NumPy does not. This makes long-running processes notoriously unstable on Matlab. - Matlab has some numerical libraries that are better. - I like the Matlab command prompt and IDE better. But its not enough to make me want to use it. - Python is a proper programming language. Matlab is a numerical scripting language - good for small scripts but not complex software systems. Sturla Molden

Xavier Gnata

12:29 p.m.

Well it is the best pitch for numpy versus matlab I have read so far :) (and I 100% agree) Xavier

...

On 1/7/2009 4:16 PM, David Cournapeau wrote:

...
I think on recent versions of matlab, there is nothing you can do without modifying the code: matlab has some JIT compilation for loops, which is supposed to speed up those cases - at least, that's what is claimed by matlab.

Yes it does. After using both for more than 10 years, my impression is this:

- Matlab slicing creates new arrays. NumPy slicing creates views. NumPy is faster and more memory efficient.

- Matlab JIT compiles loops. NumPy does not. Matlab is faster for stupid programmers that don't know how use slices. But neither Matlab nor Python/NumPy is meant to be used like Java.

- Python has psyco. It is about as good as Matlab's JIT. But psyco has no knowledge of NumPy ndarrays.

- Using Cython is easier than writing Matlab MEX files.

- Python has better support for data structures, better built-in structures (tuple, lists, dics, sets), and general purpose libraries. Matlab has extensive numerical toolboxes that you can buy.

- Matlab pass function arguments by value (albeit COW optimized). Python pass references. This makes NumPy more efficient if you need to pass large arrays or array slices.

- Matlab tends to fragment the heap (hence the pack command). Python/NumPy does not. This makes long-running processes notoriously unstable on Matlab.

- Matlab has some numerical libraries that are better.

- I like the Matlab command prompt and IDE better. But its not enough to make me want to use it.

- Python is a proper programming language. Matlab is a numerical scripting language - good for small scripts but not complex software systems.

Sturla Molden

_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Gael Varoquaux

4:50 p.m.

On Wed, Jan 07, 2009 at 07:29:41PM +0100, Xavier Gnata wrote:

...

Well it is the best pitch for numpy versus matlab I have read so far :) (and I 100% agree)

+1. This is an excellent text. IMHO it should be on the wiki somewhere. Gaël

Ryan May

9:44 a.m.

Nicolas ROUX wrote:

...

Hi,

I need help ;-) I have here a testcase which works much faster in Matlab than Numpy.

The following code takes less than 0.9sec in Matlab, but 21sec in Python. Numpy is 24 times slower than Matlab ! The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, but I have to demonstrate that this kind of Python Code is executed with the same performance than Matlab, without writing C extension. This is becoming a critical point for us.

This is a testcase that people would like to see working without any code restructuring. The reasons are: - this way of writing is fairly natural. - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code)

...So I really need to use the code below, without restructuring.

Numpy/Python code: ##################################################################### import numpy import time

print "Start test \n"

dim = 3000

a = numpy.zeros((dim,dim,3))

start = time.clock()

for i in range(dim): for j in range(dim): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2]

end = time.clock() - start

print "Test done, %f sec" % end ##################################################################### <SNIP> Any idea on it ? Did I missed something ?

I think you may have reduced the complexity a bit too much. The python code above sets all of the elements equal to a[i,j,1]. Is there any reason you can't use slicing to avoid the loops? Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma

Grissiom

9:58 a.m.

On Wed, Jan 7, 2009 at 23:44, Ryan May wrote:

...

...
Hi,

I need help ;-) I have here a testcase which works much faster in Matlab than Numpy.

The following code takes less than 0.9sec in Matlab, but 21sec in Python. Numpy is 24 times slower than Matlab ! The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, but I have to demonstrate that this kind of Python Code is executed with

...
This is becoming a critical point for us.

This is a testcase that people would like to see working without any code restructuring. The reasons are: - this way of writing is fairly natural. - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give

Nicolas ROUX wrote: the same performance than Matlab, without writing C extension. this code)

...
...So I really need to use the code below, without restructuring.

Numpy/Python code: ##################################################################### import numpy import time

print "Start test \n"

dim = 3000

a = numpy.zeros((dim,dim,3))

start = time.clock()

for i in range(dim): for j in range(dim): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2]

end = time.clock() - start

print "Test done, %f sec" % end #####################################################################

<SNIP>

...
Any idea on it ? Did I missed something ?

I think you may have reduced the complexity a bit too much. The python code above sets all of the elements equal to a[i,j,1]. Is there any reason you can't use slicing to avoid the loops?

Yes, I think so. I think the testcase is a matter of python loop vs matlab loop rather than python vs matlab. -- Cheers, Grissiom

josef.pktd＠gmail.com

10:36 a.m.

On Wed, Jan 7, 2009 at 10:58 AM, Grissiom wrote:

...

On Wed, Jan 7, 2009 at 23:44, Ryan May wrote:

...
Nicolas ROUX wrote:

...
Hi,

I need help ;-) I have here a testcase which works much faster in Matlab than Numpy.

The following code takes less than 0.9sec in Matlab, but 21sec in Python. Numpy is 24 times slower than Matlab ! The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, but I have to demonstrate that this kind of Python Code is executed with the same performance than Matlab, without writing C extension. This is becoming a critical point for us.

This is a testcase that people would like to see working without any code restructuring. The reasons are: - this way of writing is fairly natural. - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code)

...So I really need to use the code below, without restructuring.

Numpy/Python code: ##################################################################### import numpy import time

print "Start test \n"

dim = 3000

a = numpy.zeros((dim,dim,3))

start = time.clock()

for i in range(dim): for j in range(dim): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2]

end = time.clock() - start

print "Test done, %f sec" % end ##################################################################### <SNIP> Any idea on it ? Did I missed something ?

I think you may have reduced the complexity a bit too much. The python code above sets all of the elements equal to a[i,j,1]. Is there any reason you can't use slicing to avoid the loops?

Yes, I think so. I think the testcase is a matter of python loop vs matlab loop rather than python vs matlab.

-- Cheers, Grissiom

_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

I tried with matlab 2006a, I don't know if there is JIT, but the main speed difference comes with the numpy array access. The test is actually biased in favor of python, since in the matlab code the initialization with zeros is inside the time count, but outside in the python version If I just put b=1.0 inside the double loop (no numpy) Python 1.453644 sec matlab 0.335249 seconds, with zeros outside loop: 0.060582 seconds with original array assignment: python/numpy 32.745030 sec matlab 1.633415 seconds, with zeros outside loop: 1.251597 seconds (putting the loop in a function and using psyco reduces speed by 30%) Josef

josef.pktd＠gmail.com

11:27 a.m.

A test case closer to my applications is calling functions in loops: Python ----------------------------------- def assgn(a,i,j): a[i,j,0] = a[i,j,1] + 1.0 a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2] return a print "Start test \n" dim = 300#0 a = numpy.zeros((dim,dim,3)) start = time.clock() for i in range(dim): for j in range(dim): assgn(a,i,j) end = time.clock() - start assert numpy.max(a)==1.0 #added to check inplace substitution print "Test done, %f sec" % end --------------------------- matlab: ------------------------------------------------------------------ function a = tryloopspeed() 'Start test' dim = 300; a = zeros(dim,dim,3); tic; for i = 1:dim for j = 1:dim a = assgn(a,i,j); end end toc 'Test done' end function a = assgn(a,i,j) a(i,j,1) = a(i,j,2); a(i,j,2) = a(i,j,1); a(i,j,3) = a(i,j,3); end --------------------------- Note: I had to reduce the size of the matrix because I got impatient waiting for matlab time: python: Test done, 0.486127 sec matlab:

...

...
output = tryloopspeed(); ans = Start test Elapsed time is 511.815971 seconds. ans = Test done 511.815971/60.0 #minutes ans = 8.530

matlab takes 1053 times the time of python The problem is that at least in my version of matlab, it copies function arguments when they are modified. It's possible to work around this, but not very clean. So for simple loops python looses, but for other things, python wins by a huge margin. Unless somebody can spot a mistake in my timing. Josef

Christopher Barker

11:56 a.m.

josef.pktd@gmail.com wrote:

...

So for simple loops python looses, but for other things, python wins by a huge margin.

which emphasizes the point that you can't write code the same way in the two languages, though I'd argue that that code needs refactoring in any language! However, numpy's reference semantics is definitely a strong advantage of MATLAB -- more flexibility in general. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Sturla Molden

12:32 p.m.

On 1/7/2009 6:56 PM, Christopher Barker wrote:

...

...
So for simple loops python looses, but for other things, python wins by a huge margin.

which emphasizes the point that you can't write code the same way in the two languages, though I'd argue that that code needs refactoring in any language!

Roux example would be bad in either language. Slices ('vectorization' in Matlab lingo) is preferred in both cases. It's just that neither Matlab nor Python/NumPy was designed to be used like Java. For loops should not be abused in Python nor in Matlab (but Matlab is more forgiving now than it used to be). Sturla Molden

josef.pktd＠gmail.com

12:52 p.m.

On Wed, Jan 7, 2009 at 1:32 PM, Sturla Molden wrote:

...

On 1/7/2009 6:56 PM, Christopher Barker wrote:

...
...
So for simple loops python looses, but for other things, python wins by a huge margin.

which emphasizes the point that you can't write code the same way in the two languages, though I'd argue that that code needs refactoring in any language!

Roux example would be bad in either language. Slices ('vectorization' in Matlab lingo) is preferred in both cases. It's just that neither Matlab nor Python/NumPy was designed to be used like Java. For loops should not be abused in Python nor in Matlab (but Matlab is more forgiving now than it used to be).

Sturla Molden _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

I'm missing name spaces in matlab. everything is from path import * and it's more difficult to keep are larger project organized in matlab than in python. But, I think, matlab is ahead in parallelization (which I haven't used much) and learning matlab is easier than numpy. (dtypes and broadcasting are more restrictive in matlab but, for a beginner, easier to figure out) Josef

Sturla Molden

1:26 p.m.

On 1/7/2009 7:52 PM, josef.pktd@gmail.com wrote:

...

But, I think, matlab is ahead in parallelization (which I haven't used much)

Not really. There is e.g. nothing like Python's multiprocessing package in Matlab. Matlab is genrally single-threaded. Python is multi-threaded but there is a GIL. And having multiple Matlab processes running simultaneously consumes a lot of resources. Python is far better in this respect. Don't confuse vectorization with parallelization. It is not the same. If you are going to do real parallelization, you are better off using Python with multiprocessing or mpi4py.

...

and learning matlab is easier than numpy. (dtypes and broadcasting are more restrictive in matlab but, for a beginner, easier to figure out)

The available data types is about the same, at least last time I checked. (I am not thinking about Python built-ins here, but NumPy dtypes.) Matlab does not have broadcasting. Array shapes must always match. S.M.

David Warde-Farley

3:41 p.m.

On 7-Jan-09, at 2:26 PM, Sturla Molden wrote:

...

Matlab does not have broadcasting. Array shapes must always match.

Not totally true. They introduced a clunky, clunky syntax for it in version 7, IIRC, called 'bsxfun'. See http://tinyurl.com/9e7kyt . It's a better solution than indexing with a huge ones() or repmat()'ing all over the place, but not nearly as nice as NumPy overloading. David

Nicolas ROUX

9 Jan 9 Jan

6:04 a.m.

Hi ! Thanks a lot for your fast/detailed reply. A very good point for Numpy ;-) I spent all my time trying to prepare my testcase to better share with you, that's why I didn't reply fast. I understand the weakness of the missing JITcompiler in Python vs Matlab, that's why I invistigated numpy vectorization/broadcast. (hoping to find a cool way to write our code in fast Numpy) I used the page http://www.scipy.org/PerformancePython to write my code efficiently in Numpy. While doing it I found one issue. To have pretty code, I created p0 and p1 arrays of indexes. In "test8" I wished to see the commented line working, which is not the case. Having to use "ix_" is not pretty enough, and seems to not work with further dimensions. Why the comment line is not working ? ############################################ def test8(): m = 1024 n = 512 Out = numpy.zeros((m,n)) In = numpy.zeros((m,n)) p0 = numpy.ogrid[0:m] p1 = numpy.ogrid[0:n] Out[0:m,0:n] = In[0:m,0:n] #Out[p0,p1] = In[p0,p1] #This doesn't work Out[numpy.ix_(p0,p1)] = In[numpy.ix_(p0,p1)] ############################################ What is maybe not clear in the above code, is that I don't want to predefine all possible ogrid/vector. The number of possible ogrid/vector is big if in need to define all. ... And this vector definition become more paintful. So Numpy vector style is fine if i can write something like: Out[p0,p1] = In[p0,p1] #2 dimensions case And Out[p0,p1,1] = In[p0,p1,1] #3 dimensions case But is not fine if i have to add ".ix_()" or to multiply the number of vector definitions. Below example with 3 dimensions instead of 2. ############################################ def test9(): m = 1024 n = 512 Out = numpy.zeros((m,n,3)) In = numpy.zeros((m,n,3)) p0 = numpy.ogrid[0:m] p1 = numpy.ogrid[0:n] Out[0:m,0:n,2] = In[0:m,0:n,2] #Out[p0,p1,2] = In[p0,p1,2] Out[numpy.ix_(p0,p1,2)] = In[numpy.ix_(p0,p1,2)] ############################################ Tanks again for your support ;-) Cheers, Nicolas.

Sturla Molden

7:50 a.m.

...

I understand the weakness of the missing JITcompiler in Python vs Matlab, that's why I invistigated numpy vectorization/broadcast. (hoping to find a cool way to write our code in fast Numpy)

I used the page http://www.scipy.org/PerformancePython to write my code efficiently in Numpy. While doing it I found one issue.

To have pretty code, I created p0 and p1 arrays of indexes.

Nicolas ROUX

8:56 a.m.

Sorry my previous mail was probalby not clear. This mail was following the tread we had before, so with some discussion legacy. I simplified the code to focus only on "what I" need, rather to bother you with the full code. I wrote below a code closer to what I need, where you will agree that vectorization/broadcasting is needed to avoid nested loops. As I wrote in the 1st mail (added at the end), what is important is to keep the code not too ugly due to vectorization syntax. (As written below I try to demonstrate that vectorization/broadcast code could be as readable as twice nested loop ) The real code we have is even more complex, with processing the array element using 5x5 neighbours, instead of 3x3. ###################################################### def test6(): w = 3096 h = 2048 a = numpy.zeros((h,w)) #Normally loaded with real data b = numpy.zeros((h,w,3)) w0 = numpy.ogrid[0:w-2] w1 = numpy.ogrid[1:w-1] w2 = numpy.ogrid[2:w] h0 = numpy.ogrid[0:h-2] h1 = numpy.ogrid[1:h-1] h2 = numpy.ogrid[2:h] p00, p10, p20 = [h0,w0], [h1,w0],[h2,w0] p01, p11, p21 = [h0,w1], [h1,w1],[h2,w1] p02, p12, p22 = [h0,w2], [h1,w2],[h2,w2] b[p11,1] = a[p11] + 1.23*a[p22] \ - numpy.min([a[p11]-a[p00], a[p11]-a[p01], a[p11]-a[p02], a[p11]-a[p10], a[p11]-a[p12], a[p11]-a[p20], a[p11]-a[p21], a[p11]-a[p22]]) \ + 0.123*numpy.max([a[p11]-a[p00], a[p11]-a[p01], a[p11]-a[p02], a[p11]-a[p10], a[p11]-a[p12], a[p11]-a[p20], a[p11]-a[p21], a[p11]-a[p22]]) ###################################################### This code above is the one I wish to write but is not working. I hope you better understand my issue context ;-) Did I missed something ? Thanks for your help. Cheers, Nicolas.

...

I understand the weakness of the missing JITcompiler in Python vs Matlab, that's why I invistigated numpy vectorization/broadcast. (hoping to find a cool way to write our code in fast Numpy)

I used the page http://www.scipy.org/PerformancePython to write my code efficiently in Numpy. While doing it I found one issue.

To have pretty code, I created p0 and p1 arrays of indexes.

I must admit I don't quite understand what you are trying to do, and what your problem is. If you just want to do Out[:,:] = In[:,:] there is no need for meshgrids (ogrid), for-loops, or whatever. It is brain dead to use nested for-loops or ogrid for this purpose in NumPy. It is equally foolish to use nested for loops or meshgrid for this purpose in Matlab. If you do, I would seriously question your competence. By the way, you can index ogrid with more than one dimension: p = numpy.ogrid[:m, :n] Out[p] = In[p] ============================================================================ =============== ============================================================================ =============== ============================================================================ =============== Hi ! Thanks a lot for your fast/detailed reply. A very good point for Numpy ;-) I spent all my time trying to prepare my testcase to better share with you, that's why I didn't reply fast. I understand the weakness of the missing JITcompiler in Python vs Matlab, that's why I invistigated numpy vectorization/broadcast. (hoping to find a cool way to write our code in fast Numpy) I used the page http://www.scipy.org/PerformancePython to write my code efficiently in Numpy. While doing it I found one issue. To have pretty code, I created p0 and p1 arrays of indexes. In "test8" I wished to see the commented line working, which is not the case. Having to use "ix_" is not pretty enough, and seems to not work with further dimensions. Why the comment line is not working ? ############################################ def test8(): m = 1024 n = 512 Out = numpy.zeros((m,n)) In = numpy.zeros((m,n)) p0 = numpy.ogrid[0:m] p1 = numpy.ogrid[0:n] Out[0:m,0:n] = In[0:m,0:n] #Out[p0,p1] = In[p0,p1] #This doesn't work Out[numpy.ix_(p0,p1)] = In[numpy.ix_(p0,p1)] ############################################ What is maybe not clear in the above code, is that I don't want to predefine all possible ogrid/vector. The number of possible ogrid/vector is big if in need to define all. ... And this vector definition become more paintful. So Numpy vector style is fine if i can write something like: Out[p0,p1] = In[p0,p1] #2 dimensions case And Out[p0,p1,1] = In[p0,p1,1] #3 dimensions case But is not fine if i have to add ".ix_()" or to multiply the number of vector definitions. Below example with 3 dimensions instead of 2. ############################################ def test9(): m = 1024 n = 512 Out = numpy.zeros((m,n,3)) In = numpy.zeros((m,n,3)) p0 = numpy.ogrid[0:m] p1 = numpy.ogrid[0:n] Out[0:m,0:n,2] = In[0:m,0:n,2] #Out[p0,p1,2] = In[p0,p1,2] Out[numpy.ix_(p0,p1,2)] = In[numpy.ix_(p0,p1,2)] ############################################ Tanks again for your support ;-) Cheers, Nicolas. ============================================================================ =============== ============================================================================ =============== ============================================================================ =============== Hi, I need help ;-) I have here a testcase which works much faster in Matlab than Numpy. The following code takes less than 0.9sec in Matlab, but 21sec in Python. Numpy is 24 times slower than Matlab ! The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, but I have to demonstrate that this kind of Python Code is executed with the same performance than Matlab, without writing C extension. This is becoming a critical point for us. This is a testcase that people would like to see working without any code restructuring. The reasons are: - this way of writing is fairly natural. - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code) ...So I really need to use the code below, without restructuring. Numpy/Python code: ##################################################################### import numpy import time print "Start test \n" dim = 3000 a = numpy.zeros((dim,dim,3)) start = time.clock() for i in range(dim): for j in range(dim): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2] end = time.clock() - start print "Test done, %f sec" % end ##################################################################### Matlab code: ##################################################################### 'Start test' dim = 3000; tic; a =zeros(dim,dim,3); for i = 1:dim for j = 1:dim a(i,j,1) = a(i,j,2); a(i,j,2) = a(i,j,1); a(i,j,3) = a(i,j,3); end end toc 'Test done' ##################################################################### Any idea on it ? Did I missed something ? Thanks a lot, in advance for your help. Cheers, Nicolas. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Sturla Molden

9:46 a.m.

...

I simplified the code to focus only on "what I" need, rather to bother you with the full code.

def test(): w = 3096 h = 2048 a = numpy.zeros((h,w), order='F') #Normally loaded with real data b = numpy.zeros((h,w,3), order='F') w0 = slice(0,w-2) w1 = slice(1,w-1) w2 = slice(2,w) h0 = slice(0,h-2) h1 = slice(1,h-1) h2 = slice(2,h) p00, p10, p20 = [h0,w0], [h1,w0], [h2,w0] p01, p11, p21 = [h0,w1], [h1,w1], [h2,w1] p02, p12, p22 = [h0,w2], [h1,w2], [h2,w2] b[p11 + [1]] = a[p11] + 1.23*a[p22] \ - numpy.min([a[p11]-a[p00], a[p11]-a[p01], a[p11]-a[p02], a[p11]-a[p10], a[p11]-a[p12], a[p11]-a[p20], a[p11]-a[p21], a[p11]-a[p22]]) \ + 0.123*numpy.max([a[p11]-a[p00], a[p11]-a[p01], a[p11]-a[p02], a[p11]-a[p10], a[p11]-a[p12], a[p11]-a[p20], a[p11]-a[p21], a[p11]-a[p22]]) Does this work for you?

Nicolas ROUX

11:32 a.m.

Thanks ! -1- The code style is good and the performance vs matlab is good. With 400x400: Matlab = 1.56 sec (with nested "for" loop, so no optimization) Numpy = 0.99 sec (with broadcasting) -2- Now with the code below I have strange result. With w=h=400: - Using "slice" => 0.99 sec - Using "numpy.ogrid" => 0.01 sec With w=400 and h=300: - Using "slice", => 0.719sec - Using "numpy.ogrid", => broadcast ERROR ! The last broadcast error is: "ValueError: shape mismatch: objects cannot be broadcast to a single shape" ####################################################### def test(): w = 400 if 1: #---Case with different w and h h = 300 else: #---Case with same w and h h = 400 a = numpy.zeros((h,w)) #Normally loaded with real data b = numpy.zeros((h,w,3)) if 1: #---Case with SLICE w0 = slice(0,w-2) w1 = slice(1,w-1) w2 = slice(2,w) h0 = slice(0,h-2) h1 = slice(1,h-1) h2 = slice(2,h) else: #---Case with OGRID w0 = numpy.ogrid[0:w-2] w1 = numpy.ogrid[1:w-1] w2 = numpy.ogrid[2:w] h0 = numpy.ogrid[0:h-2] h1 = numpy.ogrid[1:h-1] h2 = numpy.ogrid[2:h] p00, p01, p02 = [h0,w0], [h0,w1],[h0,w2] p10, p11, p12 = [h1,w0], [h1,w1],[h1,w2] p20, p21, p22 = [h2,w0], [h2,w1],[h2,w2] b[p11+[1]] = a[p11] - numpy.min([a[p11]-a[p00], a[p11]-a[p01], a[p11]-a[p02], a[p11]-a[p10], a[p11]-a[p12], a[p11]-a[p20], a[p11]-a[p21], a[p11]-a[p22]]) \ + 0.123*numpy.max([a[p11]-a[p00], a[p11]-a[p01], a[p11]-a[p02], a[p11]-a[p10], a[p11]-a[p12], a[p11]-a[p20], a[p11]-a[p21], a[p11]-a[p22]]) ####################################################### It seems "ogrid" got better performance, but broadcasting is not working any more. Do you think it's normal that broadcast is not possible using ogrid and different w & h ? Did I missed any row/colomn missmatch ??? Thanks. Cheers, Nicolas -----Original Message----- From: numpy-discussion-bounces@scipy.org [mailto:numpy-discussion-bounces@scipy.org] On Behalf Of Sturla Molden Sent: Friday, January 09, 2009 4:47 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Numpy performance vs Matlab.

...

I simplified the code to focus only on "what I" need, rather to bother you with the full code.

Christopher Barker

1:13 p.m.

Nicolas ROUX wrote:

...

-2- Now with the code below I have strange result. With w=h=400:

...

With w=400 and h=300: - Using "numpy.ogrid", => broadcast ERROR !

The last broadcast error is: "ValueError: shape mismatch: objects cannot be broadcast to a single shape"

This is probably a broadcasting error, which means your result with w=h is probably wrong. My advice: Always test with non-square arrays, to make sure you are broadcasting as you expect. Don't test with "zeros" or "ones" for your test data -- so you can look at the results and see if you are getting what you expect. I often use something like: a = numpy.arange(w*h).reshape((h,w)) I also test with very small dimensions so that I can easily print the results of each step as I develop the code With this approach you will probably figure out what's going on. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Sturla Molden

1:44 p.m.

...

-2- Now with the code below I have strange result. With w=h=400: - Using "slice" => 0.99 sec - Using "numpy.ogrid" => 0.01 sec

It is not equivalent. The ogrid version only uses diagonal elements, and does less work.

...

It seems "ogrid" got better performance, but broadcasting is not working any more.

Broadcasting is working, but not the way you think. Ogrid is not a faster alternative to slicing. You have the same in Matlab. You can index with a slice, an array of indices, or an array of booleans. If you are going to use the second alternative, the shape of the index arrays -- in each dimension -- must equal that of the output. You cannot use a "meshgrid" with different shaped arrays of x, y and z indices. NumPy is no different from Matlab here.

Robert Kern

3:20 p.m.

On Fri, Jan 9, 2009 at 11:32, Nicolas ROUX wrote:

...

Thanks !

-1- The code style is good and the performance vs matlab is good. With 400x400: Matlab = 1.56 sec (with nested "for" loop, so no optimization) Numpy = 0.99 sec (with broadcasting)

-2- Now with the code below I have strange result. With w=h=400: - Using "slice" => 0.99 sec - Using "numpy.ogrid" => 0.01 sec

With w=400 and h=300: - Using "slice", => 0.719sec - Using "numpy.ogrid", => broadcast ERROR !

The last broadcast error is: "ValueError: shape mismatch: objects cannot be broadcast to a single shape"

####################################################### def test(): w = 400

if 1: #---Case with different w and h h = 300 else: #---Case with same w and h h = 400

a = numpy.zeros((h,w)) #Normally loaded with real data b = numpy.zeros((h,w,3))

if 1: #---Case with SLICE w0 = slice(0,w-2) w1 = slice(1,w-1) w2 = slice(2,w) h0 = slice(0,h-2) h1 = slice(1,h-1) h2 = slice(2,h) else: #---Case with OGRID w0 = numpy.ogrid[0:w-2] w1 = numpy.ogrid[1:w-1] w2 = numpy.ogrid[2:w] h0 = numpy.ogrid[0:h-2] h1 = numpy.ogrid[1:h-1] h2 = numpy.ogrid[2:h]

p00, p01, p02 = [h0,w0], [h0,w1],[h0,w2] p10, p11, p12 = [h1,w0], [h1,w1],[h1,w2] p20, p21, p22 = [h2,w0], [h2,w1],[h2,w2]

b[p11+[1]] = a[p11] - numpy.min([a[p11]-a[p00], a[p11]-a[p01], a[p11]-a[p02], a[p11]-a[p10], a[p11]-a[p12], a[p11]-a[p20], a[p11]-a[p21], a[p11]-a[p22]]) \ + 0.123*numpy.max([a[p11]-a[p00], a[p11]-a[p01], a[p11]-a[p02], a[p11]-a[p10], a[p11]-a[p12], a[p11]-a[p20], a[p11]-a[p21], a[p11]-a[p22]]) #######################################################

It seems "ogrid" got better performance, but broadcasting is not working any more. Do you think it's normal that broadcast is not possible using ogrid and different w & h ? Did I missed any row/colomn missmatch ???

There are several things wrong. Please read this document for information about how indexing works in numpy. http://docs.scipy.org/doc/numpy/user/basics.indexing.html But basically, you want slices. Using ogrid correctly will be slower. FWIW, ogrid with only one argument is fairly pointless. ogrid is intended to be used with multiple dimensions. If you just need one argument, use arange() or linspace(). With multiple arguments, ogrid will align the arrays such that they can be broadcasted as you expect. Lets take a look at some examples: In [1]: from numpy import * In [2]: ogrid[0:5] Out[2]: array([0, 1, 2, 3, 4]) In [3]: ogrid[0:6] Out[3]: array([0, 1, 2, 3, 4, 5]) Two 1D arrays. Now, if you follow the discussion in the indexing document, you know that if I were to use these as index arrays, one for each axis, the indexing mechanism will try to iterate over them in parallel. Since they have incompatible shapes, this will fail. Instead, if you put both arguments into ogrid: In [4]: ogrid[0:5, 0:6] Out[4]: [array([[0], [1], [2], [3], [4]]), array([[0, 1, 2, 3, 4, 5]])] We get the kind of arrays you need. These shapes are compatible, through broadcasting, and together form the indices to select out the part of the matrix you are interested in. However, just using the slices on the matrix instead of passing the slices through ogrid is faster. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Christopher Barker

3:40 p.m.

Robert Kern wrote:

...

Instead, if you put both arguments into ogrid:

In [4]: ogrid[0:5, 0:6] Out[4]: [array([[0], [1], [2], [3], [4]]), array([[0, 1, 2, 3, 4, 5]])]

We get the kind of arrays you need. These shapes are compatible, through broadcasting, and together form the indices to select out the part of the matrix you are interested in.

However, just using the slices on the matrix instead of passing the slices through ogrid is faster.

So what is ogrid useful for? Just curious... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Robert Kern

3:47 p.m.

On Fri, Jan 9, 2009 at 15:40, Christopher Barker wrote:

...

So what is ogrid useful for?

Just curious...

Floating point grids. x, y = ogrid[0:1:101j, 0:1:101j] -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Sturla Molden

3:49 p.m.

...

...
However, just using the slices on the matrix instead of passing the slices through ogrid is faster.

So what is ogrid useful for?

For the same problems where you would use meshgrid in Matlab. That is certain graphics problem for example; e.g. evaluating a surface z = f(x,y) over a grid of x,y values.

Christopher Barker

4:04 p.m.

Sturla Molden wrote:

...

For the same problems where you would use meshgrid in Matlab.

well, I used to use meshgrid a lot because MATLAB could not do broadcasting. Which is probably why the OP has been trying to use it. A note for the docs: The docs refer to ogrid and nd_grid, and as far as I can tell, they are the same thing, but it's confusing. actuall, as I look more, ogrid is a nd_grid with the sparse parameter set to True -- anyway, still confusing! Also, what is the "o" for -- the mnemonic might be helpful thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Robert Kern

4:12 p.m.

On Fri, Jan 9, 2009 at 16:04, Christopher Barker wrote:

...

Sturla Molden wrote:

...
For the same problems where you would use meshgrid in Matlab.

well, I used to use meshgrid a lot because MATLAB could not do broadcasting. Which is probably why the OP has been trying to use it.

A note for the docs:

The docs refer to ogrid and nd_grid, and as far as I can tell, they are the same thing, but it's confusing. actuall, as I look more, ogrid is a nd_grid with the sparse parameter set to True -- anyway, still confusing!

I'm not sure why. The docstring seems fairly clear about this.

...

Also, what is the "o" for -- the mnemonic might be helpful

"open". mgrid and ogrid are instantiations of the nd_grid class with different parameters. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Sturla Molden

4:27 p.m.

...

Sturla Molden wrote:

...
For the same problems where you would use meshgrid in Matlab.

well, I used to use meshgrid a lot because MATLAB could not do broadcasting. Which is probably why the OP has been trying to use it.

mgrid and ogrid are both meshgrids, with ogrid having a sparse representation (i.e. it uses less memory). There was no need for broadcasting in the OP's code. It just seemed a bit confused. Regards, Sturla Molden

Matthieu Brucher

7 Jan 7 Jan

9:53 a.m.

...

for i in range(dim): for j in range(dim): a[i,j,0] = a[i,j,1] a[i,j,2] = a[i,j,0] a[i,j,1] = a[i,j,2]

...

for i = 1:dim for j = 1:dim a(i,j,1) = a(i,j,2); a(i,j,2) = a(i,j,1); a(i,j,3) = a(i,j,3); end end

Hi, The two loops are not the same. As David stated, with JIT, the loops may be vectorized by Matlab on the fly. -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher

Robert Kern

2:54 p.m.

On Wed, Jan 7, 2009 at 10:19, Nicolas ROUX wrote:

...

Hi,

I need help ;-) I have here a testcase which works much faster in Matlab than Numpy.

The following code takes less than 0.9sec in Matlab, but 21sec in Python. Numpy is 24 times slower than Matlab ! The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib, but I have to demonstrate that this kind of Python Code is executed with the same performance than Matlab, without writing C extension. This is becoming a critical point for us.

This is a testcase that people would like to see working without any code restructuring.

Basically, if you want efficient numpy code, you have to use numpy idioms. If you want to continue to use Matlab idioms, keep using Matlab.

...

The reasons are: - this way of writing is fairly natural. - the original code which showed me the matlab/Numpy performance differences is much more complex, and can't benefit from broadcasting or other numpy tips (I can later give this code)

Please do. Otherwise, we can't actually address your concerns. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

5583

Age (days ago)

5585

Last active (days ago)

List overview

Download

32 comments

13 participants

participants (13)

Christopher Barker
David Cournapeau
David Warde-Farley
Francesc Alted
Gael Varoquaux
Grissiom
josef.pktd＠gmail.com
Matthieu Brucher
Nicolas ROUX
Robert Kern
Ryan May
Sturla Molden
Xavier Gnata

Numpy performance vs Matlab.

Nicolas ROUX

David Cournapeau

Francesc Alted

Sturla Molden

Sturla Molden

Grissiom

Sturla Molden

Sturla Molden

David Warde-Farley

Nicolas ROUX

Sturla Molden

Nicolas ROUX

Sturla Molden

Nicolas ROUX

Sturla Molden

Sturla Molden

Sturla Molden

tags

participants (13)