
Hi, When trying to speed up some matplotlib routines with the matplotlib dev team, I noticed that numpy.clip is pretty slow: clip(data, m, M) is slower than a direct numpy implementation (that is data[data<m] = m; data[data>M] = M; return data.copy()). My understanding is that the code does the same thing, right ? Below, a small script which shows the difference (twice slower for a 8000x256 array on my workstation): import numpy as N #========================== # To benchmark imshow alone #========================== def generate_data_2d(fr, nwin, hop, len): nframes = 1.0 * fr / hop * len return N.random.randn(nframes, nwin) def bench_clip(): m = -1. M = 1. # 2 minutes (120 sec) of sounds @ 8 kHz with 256 samples with 50 % overlap data = generate_data_2d(8000, 256, 128, 120) def clip1_bench(data, niter): for i in range(niter): blop = N.clip(data, m, M) def clip2_bench(data, niter): for i in range(niter): data[data<m] = m data[data<M] = M blop = data.copy() clip1_bench(data, 10) clip2_bench(data, 10) if __name__ == '__main__': # test clip import hotshot, hotshot.stats profile_file = 'clip.prof' prof = hotshot.Profile(profile_file, lineevents=1) prof.runcall(bench_clip) p = hotshot.stats.load(profile_file) print p.sort_stats('cumulative').print_stats(20) prof.close() cheers, David

David Cournapeau wrote:
Hi,
When trying to speed up some matplotlib routines with the matplotlib dev team, I noticed that numpy.clip is pretty slow: clip(data, m, M) is slower than a direct numpy implementation (that is data[data<m] = m; data[data>M] = M; return data.copy()). My understanding is that the code does the same thing, right ?
Below, a small script which shows the difference (twice slower for a 8000x256 array on my workstation):
I think there was a bug in your clip2_bench that was making it artificially fast. Attached is a script that I think gives a more fair comparison, in which clip1 and clip2 are nearly identical, and includes a third version using putmask which is faster than either of the others: 15 function calls in 6.450 CPU seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.004 0.004 6.450 6.450 cliptest.py:10(bench_clip) 1 2.302 2.302 2.302 2.302 cliptest.py:19(clip2_bench) 1 0.013 0.013 2.280 2.280 cliptest.py:15(clip1_bench) 10 2.267 0.227 2.267 0.227 /usr/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:357(clip) 1 1.498 1.498 1.498 1.498 cliptest.py:25(clip3_bench) 1 0.366 0.366 0.366 0.366 cliptest.py:6(generate_data_2d) 0 0.000 0.000 profile:0(profiler) Eric
import numpy as N
#========================== # To benchmark imshow alone #========================== def generate_data_2d(fr, nwin, hop, len): nframes = 1.0 * fr / hop * len return N.random.randn(nframes, nwin)
def bench_clip(): m = -1. M = 1. # 2 minutes (120 sec) of sounds @ 8 kHz with 256 samples with 50 % overlap data = generate_data_2d(8000, 256, 128, 120)
def clip1_bench(data, niter): for i in range(niter): blop = N.clip(data, m, M) def clip2_bench(data, niter): for i in range(niter): data[data<m] = m data[data<M] = M blop = data.copy()
clip1_bench(data, 10) clip2_bench(data, 10)
if __name__ == '__main__': # test clip import hotshot, hotshot.stats profile_file = 'clip.prof' prof = hotshot.Profile(profile_file, lineevents=1) prof.runcall(bench_clip) p = hotshot.stats.load(profile_file) print p.sort_stats('cumulative').print_stats(20) prof.close()
cheers,
David _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Hi David The benchmark below isn't quite correct. In clip2_bench the data is effectively only clipped once. I attach a slightly modified version, for which the benchmark results look like this: 4 function calls in 4.631 CPU seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.003 0.003 4.631 4.631 clipb.py:10(bench_clip) 1 2.149 2.149 2.149 2.149 clipb.py:16(clip1_bench) 1 2.070 2.070 2.070 2.070 clipb.py:19(clip2_bench) 1 0.409 0.409 0.409 0.409 clipb.py:6(generate_data_2d) 0 0.000 0.000 profile:0(profiler) The remaining difference is probably a cache effect. If I change the order, so that clip1_bench is executed last, I see: 4 function calls in 5.250 CPU seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.003 0.003 5.250 5.250 clipb.py:10(bench_clip) 1 2.588 2.588 2.588 2.588 clipb.py:19(clip2_bench) 1 2.148 2.148 2.148 2.148 clipb.py:16(clip1_bench) 1 0.512 0.512 0.512 0.512 clipb.py:6(generate_data_2d) 0 0.000 0.000 profile:0(profiler) Regards Stéfan On Mon, Dec 18, 2006 at 04:17:08PM +0900, David Cournapeau wrote:
Hi,
When trying to speed up some matplotlib routines with the matplotlib dev team, I noticed that numpy.clip is pretty slow: clip(data, m, M) is slower than a direct numpy implementation (that is data[data<m] = m; data[data>M] = M; return data.copy()). My understanding is that the code does the same thing, right ?
Below, a small script which shows the difference (twice slower for a 8000x256 array on my workstation):
[...]

Stefan van der Walt wrote:
Hi David
The benchmark below isn't quite correct. In clip2_bench the data is effectively only clipped once. I attach a slightly modified version, for which the benchmark results look like this: Yes, I of course mistyped the < and the copy. But the function is still moderately faster on my workstation:
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.003 0.003 3.944 3.944 slowclip.py:10(bench_clip) 1 0.011 0.011 2.001 2.001 slowclip.py:16(clip1_bench) 10 1.990 0.199 1.990 0.199 /home/david/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:372(clip) 1 1.682 1.682 1.682 1.682 slowclip.py:19(clip2_bench) 1 0.258 0.258 0.258 0.258 slowclip.py:6(generate_data_2d) 0 0.000 0.000 profile:0(profiler) I agree this is not really much a difference, though. The question is then, in the context of matplotlib, is there really a need to copy ? Because if I do not copy the array before clipping, then the function is really faster (for those wondering, this is one bottleneck when calling matplotlib.imshow, used in specgram and so on), cheers, David

On Mon, Dec 18, 2006 at 05:45:09PM +0900, David Cournapeau wrote:
Yes, I of course mistyped the < and the copy. But the function is still moderately faster on my workstation:
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.003 0.003 3.944 3.944 slowclip.py:10(bench_clip) 1 0.011 0.011 2.001 2.001 slowclip.py:16(clip1_bench) 10 1.990 0.199 1.990 0.199 /home/david/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:372(clip) 1 1.682 1.682 1.682 1.682 slowclip.py:19(clip2_bench) 1 0.258 0.258 0.258 0.258 slowclip.py:6(generate_data_2d) 0 0.000 0.000 profile:0(profiler)
Did you try swapping the order of execution (i.e. clip1 second)? Cheers Stéfan

Stefan van der Walt wrote:
On Mon, Dec 18, 2006 at 05:45:09PM +0900, David Cournapeau wrote:
Yes, I of course mistyped the < and the copy. But the function is still moderately faster on my workstation:
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.003 0.003 3.944 3.944 slowclip.py:10(bench_clip) 1 0.011 0.011 2.001 2.001 slowclip.py:16(clip1_bench) 10 1.990 0.199 1.990 0.199 /home/david/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:372(clip) 1 1.682 1.682 1.682 1.682 slowclip.py:19(clip2_bench) 1 0.258 0.258 0.258 0.258 slowclip.py:6(generate_data_2d) 0 0.000 0.000 profile:0(profiler)
Did you try swapping the order of execution (i.e. clip1 second)? Yes, I tried different orders, etc... and it showed the same pattern. The thing is, this kind of thing is highly CPU dependent in my experience; I don't have the time right now to update numpy.scipy on my laptop, but it happens that profiles results are quite different between my workstation (P4 xeon) and my laptop (pentium m).
anyway, contrary to what I thought first, the real problem is the copy, so this is where I should investigate in matplotlib case, David

David, I think my earlier post got lost in the exchange between you and Stefan, so I will reiterate the central point: numpy.clip *is* slow, in that an implementation using putmask is substantially faster: def fastclip(a, vmin, vmax): a = a.copy() putmask(a, a<=vmin, vmin) putmask(a, a>=vmax, vmax) return a Using the equivalent of this in a modification of your benchmark, the time using the native clip on *or* your alternative on my machine was about 2.3 s, versus 1.5 s for the putmask-based equivalent. It seems that putmask is quite a bit faster than boolean indexing. Obviously, the function above could be implemented as a method, and a copy kwarg could be used to make the copy optional--often one does not need a copy. It is also clear that it should be possible to make a much faster native clip function that does everything in one pass with no intermediate arrays at all. Whether this is something numpy devels would want to do, and how much effort it would take, are entirely different questions. I looked at the present code in clip (and part of the way through the chain of functions it invokes) and was quite baffled. Eric David Cournapeau wrote:
Stefan van der Walt wrote:
On Mon, Dec 18, 2006 at 05:45:09PM +0900, David Cournapeau wrote:
Yes, I of course mistyped the < and the copy. But the function is still moderately faster on my workstation:
ncalls tottime percall cumtime percall filename:lineno(function) 1 0.003 0.003 3.944 3.944 slowclip.py:10(bench_clip) 1 0.011 0.011 2.001 2.001 slowclip.py:16(clip1_bench) 10 1.990 0.199 1.990 0.199 /home/david/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:372(clip) 1 1.682 1.682 1.682 1.682 slowclip.py:19(clip2_bench) 1 0.258 0.258 0.258 0.258 slowclip.py:6(generate_data_2d) 0 0.000 0.000 profile:0(profiler) Did you try swapping the order of execution (i.e. clip1 second)? Yes, I tried different orders, etc... and it showed the same pattern. The thing is, this kind of thing is highly CPU dependent in my experience; I don't have the time right now to update numpy.scipy on my laptop, but it happens that profiles results are quite different between my workstation (P4 xeon) and my laptop (pentium m).
anyway, contrary to what I thought first, the real problem is the copy, so this is where I should investigate in matplotlib case,
David _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

Eric Firing wrote:
David,
I think my earlier post got lost in the exchange between you and Stefan, so I will reiterate the central point: numpy.clip *is* slow, in that an implementation using putmask is substantially faster:
def fastclip(a, vmin, vmax): a = a.copy() putmask(a, a<=vmin, vmin) putmask(a, a>=vmax, vmax) return a
Using the equivalent of this in a modification of your benchmark, the time using the native clip on *or* your alternative on my machine was about 2.3 s, versus 1.5 s for the putmask-based equivalent. It seems that putmask is quite a bit faster than boolean indexing.
Obviously, the function above could be implemented as a method, and a copy kwarg could be used to make the copy optional--often one does not need a copy.
It is also clear that it should be possible to make a much faster native clip function that does everything in one pass with no intermediate arrays at all. Whether this is something numpy devels would want to do, and how much effort it would take, are entirely different questions. I looked at the present code in clip (and part of the way through the chain of functions it invokes) and was quite baffled. Well, this is something I would be willing to try *if* this is the main bottleneck of imshow/show. I am still unsure about the problem, because if I change numpy.clip to my function, including a copy, I really get a big difference myself:
val = ma.array(nx.clip(val.filled(vmax), vmin, vmax), mask=mask) vs def myclip(b, m, M): a = b.copy() a[a<m] = m a[a>M] = M return a val = ma.array(myclip(val.filled(vmax), vmin, vmax), mask=mask) By trying the best result, I get 0.888 ms vs 0.784 for a show() call, which is already a 10 % improvement, and I get almost a 15 % if I remove the copy. I am updating numpy/scipy/mpl on my laptop to see if this is specific to the CPU of my workstation (big cache, high frequency clock, bi CPU with HT enabled). I would really like to see the imshow/show calls goes in the range of a few hundred ms; for interactive plotting, this really change a lot in my opinion. cheers, David

On Tue, Dec 19, 2006 at 02:10:29PM +0900, David Cournapeau wrote:
I would really like to see the imshow/show calls goes in the range of a few hundred ms; for interactive plotting, this really change a lot in my opinion.
I think this is strongly dependant on some parameters. I did some interactive plotting on both a pentium 2, linux, WxAgg (thus Gtk behind Wx), and a pentium 4, windows, WxAgg (thus MFC behin Wx), and there was a huge difference between the speeds. The speed difference was a few orders of magnitudes. I couldn't explain it but it was a good surprise, as the application was developped for the windows box. Gaël

Gael Varoquaux wrote:
On Tue, Dec 19, 2006 at 02:10:29PM +0900, David Cournapeau wrote:
I would really like to see the imshow/show calls goes in the range of a few hundred ms; for interactive plotting, this really change a lot in my opinion.
I think this is strongly dependant on some parameters. I did some interactive plotting on both a pentium 2, linux, WxAgg (thus Gtk behind Wx), and a pentium 4, windows, WxAgg (thus MFC behin Wx), and there was a huge difference between the speeds. The speed difference was a few orders of magnitudes. I couldn't explain it but it was a good surprise, as the application was developped for the windows box. I started to investigate the problem because under matlab, plotting a spectrogram is negligeable compared to computing it, whereas in matplotlib with numpy array backend, plotting it takes as much time as computing it, which didn't make sense to me.
Most of the computing time is spend into code which is independent of the backend, that is during the conversion from the rank 2 array to rgba (60 % of the time of my fast workstation, 85 % of the time on my laptop with a pentium M @ 1.2 Ghz), so I don't think the GUI backend makes any difference. cheers, David

David Cournapeau wrote:
Eric Firing wrote:
David,
I think my earlier post got lost in the exchange between you and Stefan, so I will reiterate the central point: numpy.clip *is* slow, in that an implementation using putmask is substantially faster:
def fastclip(a, vmin, vmax): a = a.copy() putmask(a, a<=vmin, vmin) putmask(a, a>=vmax, vmax) return a
Using the equivalent of this in a modification of your benchmark, the time using the native clip on *or* your alternative on my machine was about 2.3 s, versus 1.5 s for the putmask-based equivalent. It seems that putmask is quite a bit faster than boolean indexing.
Obviously, the function above could be implemented as a method, and a copy kwarg could be used to make the copy optional--often one does not need a copy.
It is also clear that it should be possible to make a much faster native clip function that does everything in one pass with no intermediate arrays at all. Whether this is something numpy devels would want to do, and how much effort it would take, are entirely different questions. I looked at the present code in clip (and part of the way through the chain of functions it invokes) and was quite baffled. Well, this is something I would be willing to try *if* this is the main bottleneck of imshow/show. I am still unsure about the problem, because if I change numpy.clip to my function, including a copy, I really get a big difference myself:
val = ma.array(nx.clip(val.filled(vmax), vmin, vmax), mask=mask)
vs
def myclip(b, m, M): a = b.copy() a[a<m] = m a[a>M] = M return a val = ma.array(myclip(val.filled(vmax), vmin, vmax), mask=mask)
By trying the best result, I get 0.888 ms vs 0.784 for a show() call, which is already a 10 % improvement, and I get almost a 15 % if I remove the copy. I am updating numpy/scipy/mpl on my laptop to see if this is specific to the CPU of my workstation (big cache, high frequency clock, bi CPU with HT enabled).
Please try the putmask version without the copy on your machines; I expect it will be quite a bit faster on both machines. The relative speeds of the versions may differ widely depending on how many values actually get changed, though. Eric

Eric Firing wrote:
David Cournapeau wrote:
Well, this is something I would be willing to try *if* this is the main bottleneck of imshow/show. I am still unsure about the problem, because if I change numpy.clip to my function, including a copy, I really get a big difference myself:
val = ma.array(nx.clip(val.filled(vmax), vmin, vmax), mask=mask)
vs
def myclip(b, m, M): a = b.copy() a[a<m] = m a[a>M] = M return a val = ma.array(myclip(val.filled(vmax), vmin, vmax), mask=mask)
By trying the best result, I get 0.888 ms vs 0.784 for a show() call, which is already a 10 % improvement, and I get almost a 15 % if I remove the copy. I am updating numpy/scipy/mpl on my laptop to see if this is specific to the CPU of my workstation (big cache, high frequency clock, bi CPU with HT enabled).
Please try the putmask version without the copy on your machines; I expect it will be quite a bit faster on both machines. The relative speeds of the versions may differ widely depending on how many values actually get changed, though. On my workstation (dual xeon; I run each corresponding script 5 times and took the best result): - nx.clip takes ~ 170 ms (of 920 ms for the whole show call) - your fast clip, with copy: ~ 50 ms (of ~820 ms) - mine, with copy: ~50 ms (of ~830 ms) - your wo copy: ~ 30 ms (of 830 ms) - mine wo copy: ~ 40 ms (of 830 ms)
Same on my laptop (pentium M @ 1.2 Ghz): - nx.clip takes ~ 230 ms (of 1460 ms) - mine with copy ~ 70 ms (of 1200 ms) - mine wo copy ~ 55 ms (of 1300 ms) - yours with copy ~ 80 ms (of 1300 ms) - yours wo copy ~ 67 ms (of 1300 ms) Basically, at least from those figures, both versions are pretty similar, and not worth improving much anyway for matplotlib. There is something funny with numpy version, though. cheers, David

David Cournapeau wrote:
Basically, at least from those figures, both versions are pretty similar, and not worth improving much anyway for matplotlib. There is something funny with numpy version, though.
Looking at the code, it's certainly not surprising that the current implementation of clip() is slow. It is a direct numpy C API translation of the following (taken from numarray, but it is the same in Numeric): def clip(m, m_min, m_max): """clip() returns a new array with every entry in m that is less than m_min replaced by m_min, and every entry greater than m_max replaced by m_max. """ selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) return choose(selector, (m, m_min, m_max)) Creating that integer selector array is probably the most expensive part. Copying the array, then using putmask() or similar is certainly a better approach, and I can see no drawbacks to it. If anyone is up to translating their faster clip() into C, I'm more than happy to check it in. I might also entertain adding a copy=True keyword argument, but I'm not entirely certain we should be expanding the API during the 1.0.x series. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Robert Kern wrote:
Looking at the code, it's certainly not surprising that the current implementation of clip() is slow. It is a direct numpy C API translation of the following (taken from numarray, but it is the same in Numeric):
def clip(m, m_min, m_max): """clip() returns a new array with every entry in m that is less than m_min replaced by m_min, and every entry greater than m_max replaced by m_max. """ selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) return choose(selector, (m, m_min, m_max))
Creating that integer selector array is probably the most expensive part. Copying the array, then using putmask() or similar is certainly a better approach, and I can see no drawbacks to it.
If anyone is up to translating their faster clip() into C, I'm more than happy to check it in. I might also entertain adding a copy=True keyword argument, but I'm not entirely certain we should be expanding the API during the 1.0.x series.
I would be happy to code the function; for new code to be added to numpy, is there another branch than the current one ? What is the approach for a 1.1.x version of numpy ? For now, putting the function with a copy (the current behaviour ?) would be ok, right ? The copy part is a much smaller problem than the rest of the function anyway, at least from my modest benchmarking, David

David Cournapeau wrote:
I would be happy to code the function; for new code to be added to numpy, is there another branch than the current one ? What is the approach for a 1.1.x version of numpy ?
I don't think we've decided on one, yet.
For now, putting the function with a copy (the current behaviour ?) would be ok, right ? The copy part is a much smaller problem than the rest of the function anyway, at least from my modest benchmarking,
I'd prefer that you simply modify PyArray_Clip to use a better approach than to make an entirely new function. In that case, it certainly must make a copy. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

David Cournapeau wrote:
Robert Kern wrote:
Looking at the code, it's certainly not surprising that the current implementation of clip() is slow. It is a direct numpy C API translation of the following (taken from numarray, but it is the same in Numeric):
def clip(m, m_min, m_max): """clip() returns a new array with every entry in m that is less than m_min replaced by m_min, and every entry greater than m_max replaced by m_max. """ selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) return choose(selector, (m, m_min, m_max))
Creating that integer selector array is probably the most expensive part. Copying the array, then using putmask() or similar is certainly a better approach, and I can see no drawbacks to it.
If anyone is up to translating their faster clip() into C, I'm more than happy to check it in. I might also entertain adding a copy=True keyword argument, but I'm not entirely certain we should be expanding the API during the 1.0.x series.
I would be happy to code the function; for new code to be added to numpy, is there another branch than the current one ? What is the approach for a 1.1.x version of numpy ?
The idea is to make a 1.0.x branch as soon as the trunk changes the C-API. The guarantee is that extension modules won't have to be rebuilt until 1.1. I don't know that we've specified if there will be *no* API changes. For example, there have already been some backward-compatible extensions to the 1.0.X series. I like the idea of being able to add functions to the 1.0.X series but without breaking compatibility. I also don't mind adding new keywords to functions (but not to C-API calls as that would require a re-compile of extension modules). -Travis

Robert Kern wrote:
David Cournapeau wrote:
Basically, at least from those figures, both versions are pretty similar, and not worth improving much anyway for matplotlib. There is something funny with numpy version, though.
Looking at the code, it's certainly not surprising that the current implementation of clip() is slow. It is a direct numpy C API translation of the following (taken from numarray, but it is the same in Numeric):
def clip(m, m_min, m_max): """clip() returns a new array with every entry in m that is less than m_min replaced by m_min, and every entry greater than m_max replaced by m_max. """ selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) return choose(selector, (m, m_min, m_max))
There are a lot of functions that are essentially this. Many things were done to just get something working. It would seem like a good idea to re-code many of these to speed them up.
Creating that integer selector array is probably the most expensive part. Copying the array, then using putmask() or similar is certainly a better approach, and I can see no drawbacks to it.
If anyone is up to translating their faster clip() into C, I'm more than happy to check it in. I might also entertain adding a copy=True keyword argument, but I'm not entirely certain we should be expanding the API during the 1.0.x series.
The problem with the copy=True keyword is that it would imply needing to expand the C-API for PyArray_Clip and should not be done until 1.1 IMHO. We would probably be better off not expanding the keyword arguments to methods as well until that time. -Travis

Travis Oliphant wrote:
The problem with the copy=True keyword is that it would imply needing to expand the C-API for PyArray_Clip and should not be done until 1.1 IMHO.
I don't think we have to change the signature of PyArray_Clip() at all. PyArray_Clip() takes an "out" argument. Currently, this is only set to something other than NULL if explicitly provided as a keyword "out=" argument to numpy.ndarray.clip(). All we have to do is modify the implementation of array_clip() to parse a "copy=" argument and set "out = self" before calling PyArray_Clip(). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Robert Kern wrote:
Travis Oliphant wrote:
The problem with the copy=True keyword is that it would imply needing to expand the C-API for PyArray_Clip and should not be done until 1.1 IMHO.
I don't think we have to change the signature of PyArray_Clip() at all. PyArray_Clip() takes an "out" argument. Currently, this is only set to something other than NULL if explicitly provided as a keyword "out=" argument to numpy.ndarray.clip(). All we have to do is modify the implementation of array_clip() to parse a "copy=" argument and set "out = self" before calling PyArray_Clip().
I admit to not following the clip discussion very closely, but if PyArray_Clip already supports 'out', why use a copy parameter at all? Why not just expose 'out' at the python level. This allows in place operations: "clip(m, m_min, m_max, out=m)", it is more flexible than a copy argument and matches the interface of a whole pile of other functions. My $0.02 -tim

Tim Hochberg wrote:
Robert Kern wrote:
Travis Oliphant wrote:
The problem with the copy=True keyword is that it would imply needing to expand the C-API for PyArray_Clip and should not be done until 1.1 IMHO.
I don't think we have to change the signature of PyArray_Clip() at all. PyArray_Clip() takes an "out" argument. Currently, this is only set to something other than NULL if explicitly provided as a keyword "out=" argument to numpy.ndarray.clip(). All we have to do is modify the implementation of array_clip() to parse a "copy=" argument and set "out = self" before calling PyArray_Clip().
I admit to not following the clip discussion very closely, but if PyArray_Clip already supports 'out', why use a copy parameter at all? Why not just expose 'out' at the python level. This allows in place operations: "clip(m, m_min, m_max, out=m)", it is more flexible than a copy argument and matches the interface of a whole pile of other functions.
It's already exposed. I just didn't know that before I proposed copy=True (and when I learned it, my brain was already stuck in that mode). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Travis Oliphant wrote:
There are a lot of functions that are essentially this. Many things were done to just get something working. It would seem like a good idea to re-code many of these to speed them up.
Off the top of your head, do you have a list of these? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Robert Kern wrote:
David Cournapeau wrote:
Basically, at least from those figures, both versions are pretty similar, and not worth improving much anyway for matplotlib. There is something funny with numpy version, though.
Looking at the code, it's certainly not surprising that the current implementation of clip() is slow. It is a direct numpy C API translation of the following (taken from numarray, but it is the same in Numeric):
def clip(m, m_min, m_max): """clip() returns a new array with every entry in m that is less than m_min replaced by m_min, and every entry greater than m_max replaced by m_max. """ selector = ufunc.less(m, m_min)+2*ufunc.greater(m, m_max) return choose(selector, (m, m_min, m_max))
There are a lot of functions that are essentially this. Many things were done to just get something working. It would seem like a good idea to re-code many of these to speed them up.
Creating that integer selector array is probably the most expensive part. Copying the array, then using putmask() or similar is certainly a better approach, and I can see no drawbacks to it.
If anyone is up to translating their faster clip() into C, I'm more than happy to check it in. I might also entertain adding a copy=True keyword argument, but I'm not entirely certain we should be expanding the API during the 1.0.x series.
The problem with the copy=True keyword is that it would imply needing to expand the C-API for PyArray_Clip and should not be done until 1.1 IMHO.
We would probably be better off not expanding the keyword arguments to methods as well until that time. When I went back to home, I started taking a close look a numpy/core C
Travis Oliphant wrote: sources, with the help of the numpy ebook. The huge source files make it really difficult for me to follow some things: I was wondering if there is some rationale behind it, or if this is just a remain of old developments of numpy. The main problem I have with those huge files is that I am confused between the functions parts of the public API, the one for backward compatibility, etc... I wanted to extract the PyArray_TakeFom function to see where the time is spent, but this is quite difficult, because of various dependencies. My question is then: is there any plan to change this ? If not, is this for some reasons I don't see, or is this just because of lack of manpower ? cheers, David

On 12/19/06, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
Travis Oliphant wrote:
Robert Kern wrote:
David Cournapeau wrote:
<snip> When I went back to home, I started taking a close look a numpy/core C
sources, with the help of the numpy ebook. The huge source files make it really difficult for me to follow some things: I was wondering if there is some rationale behind it, or if this is just a remain of old developments of numpy.
The main problem I have with those huge files is that I am confused between the functions parts of the public API, the one for backward compatibility, etc... I wanted to extract the PyArray_TakeFom function to see where the time is spent, but this is quite difficult, because of various dependencies.
My question is then: is there any plan to change this ? If not, is this for some reasons I don't see, or is this just because of lack of manpower ?
I raised the possibility of breaking up the files before and Travis was agreeable to the idea. It is still in the back of my mind but I haven't got around to doing anything about it. Maybe we should put together a step by step approach, agree on some file names for the new files, fix the build so it loads in the new stub files in the correct order, and then start moving stuff. My own original desire was to break out the keyword parsers into a separate file but I think Travis had different priorities. Chuck

My question is then: is there any plan to change this ? If not, is this for some reasons I don't see, or is this just because of lack of manpower ?
I raised the possibility of breaking up the files before and Travis was agreeable to the idea. It is still in the back of my mind but I haven't got around to doing anything about it. Maybe we should put together a step by step approach, agree on some file names for the new files, fix the build so it loads in the new stub files in the correct order, and then start moving stuff. My own original desire was to break out the keyword parsers into a separate file but I think Travis had different priorities.
The problem with separate files is (and has always been) the NumPy C-API. I tried to use separate files to some extent (and then use #include to make it all one big file). The C-API is exposed by filling in a table of function pointers. You will notice that when arrayobject.h is included for an extension module, all of the C-API is defined to pull a particular function pointer out of a table that is stored in a Python CObject in the multiarray module extension itself. Basically, NumPy is following the standard Python advice (as Numeric and Numarray did) about how to expose a C-API, but it's just gotten a bit big. Solutions to that problem are always welcome. -Travis

On 12/20/06, Travis Oliphant <oliphant@ee.byu.edu> wrote:
My question is then: is there any plan to change this ? If not, is this for some reasons I don't see, or is this just because of lack of manpower ?
I raised the possibility of breaking up the files before and Travis was agreeable to the idea. It is still in the back of my mind but I haven't got around to doing anything about it. Maybe we should put together a step by step approach, agree on some file names for the new files, fix the build so it loads in the new stub files in the correct order, and then start moving stuff. My own original desire was to break out the keyword parsers into a separate file but I think Travis had different priorities.
The problem with separate files is (and has always been) the NumPy C-API. I tried to use separate files to some extent (and then use #include to make it all one big file). The C-API is exposed by filling in a table of function pointers. You will notice that when arrayobject.h is included for an extension module, all of the C-API is defined to pull a particular function pointer out of a table that is stored in a Python CObject in the multiarray module extension itself. Basically, NumPy is following the standard Python advice (as Numeric and Numarray did) about how to expose a C-API, but it's just gotten a bit big.
Solutions to that problem are always welcome.
I've been thinking about that a bit. One solution is to have a small python program that takes all the pieces and writes one big build file, I think something like that happens now. Another might be to use includes in a base file; there is nothing sacred about not including .c files or not putting code in .h files, it is just a convention, we could even chose another extension. I also wonder if we couldn't just link in object files. The table of function pointers just needs some addresses and, while the python convention of hiding all the function names by using static functions is nice, it is probably not required. Maybe we could use ctypes in some way? I am not pushing any of these alternatives at the moment, just putting them down. Maybe there are others? Chuck

Charles R Harris wrote:
I've been thinking about that a bit. One solution is to have a small python program that takes all the pieces and writes one big build file, I think something like that happens now. Another might be to use includes in a base file; there is nothing sacred about not including .c files or not putting code in .h files, it is just a convention, we could even chose another extension. I also wonder if we couldn't just link in object files. The table of function pointers just needs some addresses and, while the python convention of hiding all the function names by using static functions is nice, it is probably not required. Maybe we could use ctypes in some way?
I am not pushing any of these alternatives at the moment, just putting them down. Maybe there are others?
None that I want to think about. #including separate .c files, leaving the extension alone, is best, IMO. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On 12/22/06, Robert Kern <robert.kern@gmail.com> wrote:
Charles R Harris wrote:
I've been thinking about that a bit. One solution is to have a small python program that takes all the pieces and writes one big build file, I think something like that happens now. Another might be to use includes in a base file; there is nothing sacred about not including .c files or not putting code in .h files, it is just a convention, we could even chose another extension. I also wonder if we couldn't just link in object files. The table of function pointers just needs some addresses and, while the python convention of hiding all the function names by using static functions is nice, it is probably not required. Maybe we could use ctypes in some way?
I am not pushing any of these alternatives at the moment, just putting them down. Maybe there are others?
None that I want to think about. #including separate .c files, leaving the extension alone, is best, IMO.
I've studied a bit how exposing C api from python extensions work at the python.org website. My understanding is that the problem when splitting into different files is that the C standard has no class storage equivalent to a "shared static", eg using a function in several C files of the same shared library, without the function being exposed in the shared library. One elegant solution for this is non portable, unfortunately: recent gcc version has this functionality called new C++ visibility support, which also works for C source files. http://gcc.gnu.org/wiki/Visibility This file explains the different ways of limiting symbols in dso available http://people.redhat.com/drepper/dsohowto.pdf Having several include of C files is the easiest way, and I guess this would be the safest way to start splitting source files. A better way can always be used after anyway, I guess. The question would be then: how do people think one should split the files ? By topics (eg one file for arrays destruction/construction, one file for elementary operations, one file for the C api, etc...) ? I am willing to spend some time on this, if this is considered useful, cheers, David

David Cournapeau wrote:
On 12/22/06, Robert Kern <robert.kern@gmail.com> wrote:
Charles R Harris wrote:
I've been thinking about that a bit. One solution is to have a small python program that takes all the pieces and writes one big build file, I think something like that happens now. Another might be to use includes in a base file; there is nothing sacred about not including .c files or not putting code in .h files, it is just a convention, we could even chose another extension. I also wonder if we couldn't just link in object files. The table of function pointers just needs some addresses and, while the python convention of hiding all the function names by using static functions is nice, it is probably not required. Maybe we could use ctypes in some way?
I am not pushing any of these alternatives at the moment, just putting them down. Maybe there are others?
None that I want to think about. #including separate .c files, leaving the extension alone, is best, IMO.
The question would be then: how do people think one should split the files ? By topics (eg one file for arrays destruction/construction, one file for elementary operations, one file for the C api, etc...) ?
I think it's useful but don't have time to think very much about it. I suspect anything that's semi-coherent that results in smaller files will be beneficial for editing purposes. The only real opinion I have at this point is that I'd like to see multiarraymodule.c contain little more than include statements (of headers and other .c files) and comments. -Travis

David Cournapeau wrote:
en I went back to home, I started taking a close look a numpy/core C sources, with the help of the numpy ebook. The huge source files make it really difficult for me to follow some things: I was wondering if there is some rationale behind it, or if this is just a remain of old developments of numpy.
The main problem I have with those huge files is that I am confused between the functions parts of the public API, the one for backward compatibility, etc... I wanted to extract the PyArray_TakeFom function to see where the time is spent, but this is quite difficult, because of various dependencies.
My question is then: is there any plan to change this ? If not, is this for some reasons I don't see, or is this just because of lack of manpower ?
I'm not sure what you mean by "this". I have no plans to change the infrastructure, but naturally suggestions are always welcome. You just have to understand and figure out the limitations of trying to expose a C-API. -Travis

David Cournapeau wrote:
en I went back to home, I started taking a close look a numpy/core C sources, with the help of the numpy ebook. The huge source files make it really difficult for me to follow some things: I was wondering if there is some rationale behind it, or if this is just a remain of old developments of numpy.
The main problem I have with those huge files is that I am confused between the functions parts of the public API, the one for backward compatibility, etc... I wanted to extract the PyArray_TakeFom function to see where the time is spent, but this is quite difficult, because of various dependencies.
My question is then: is there any plan to change this ? If not, is this for some reasons I don't see, or is this just because of lack of manpower ?
I'm not sure what you mean by "this". I have no plans to change the infrastructure, but naturally suggestions are always welcome. You just have to understand and figure out the limitations of trying to expose a C-API. "this" was just about the big source files, and I was wondering if there was a rationale or not. Your previous email answered this: there is a rationale. I don't have much experience in pure C python modules, and if
Travis Oliphant wrote: this is the standard python way of doing things, I guess there is no other easy way of doing things. Thank you for your explanation, David
participants (9)
-
Charles R Harris
-
David Cournapeau
-
David Cournapeau
-
Eric Firing
-
Gael Varoquaux
-
Robert Kern
-
Stefan van der Walt
-
Tim Hochberg
-
Travis Oliphant