[Numpy-discussion] GSoC : Performance parity between numpy arrays and Python scalars

Raul Cota raul at virtualmaterials.com
Thu May 2 12:27:09 EDT 2013

For the sake of completeness, I don't think I ever mentioned what I used 
to profile when I was working on speeding up the scalars. I used AQTime 
7. It is commercial and only for Windows (as far as I know). It works 
great and it gave me fairly accurate timings and all sorts of visual 
navigation features. I do have to mock around with the numpy code every 
time I want to compile it to get it to play nicely with Visual Studio to 
generate the proper bindings for the profiler.


On 02/05/2013 7:14 AM, Nathaniel Smith wrote:
> On Thu, May 2, 2013 at 6:26 AM, Arink Verma <arinkverma at iitrpr.ac.in> wrote:
>> Yes, we need to ensure that..
>> Code generator can be made, which can create code for table of registered
>> dtype during build time itself.
> I'd probably just generate it at run-time on an as-needed basis.
> (I.e., use the full lookup logic the first time, then save the
> result.) New dtypes can be registered, which will mean the tables need
> to change size at runtime anyway. If someone does some strange thing
> like add float16's and float64's, we can do the lookup to determine
> that this should be handled by the float64/float64 loop, and then
> store that information so that the next time it's fast (but we
> probably don't want to be calculating all combinations at build-time,
> which would require running the full type resolution machinery, esp.
> since it wouldn't really bring any benefits that I can see).
> * Re: the profiling, I wrote a full oprofile->callgrind format script
> years ago: http://vorpus.org/~njs/op2calltree.py
> Haven't used it in years either but neither oprofile nor kcachegrind
> are terribly fast-moving projects so it's probably still working, or
> could be made so without much work.
> Or easier is to use the gperftools CPU profiler:
> https://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html
> Instead of linking to it at build time, you can just use ctypes:
> In [7]: profiler = ctypes.CDLL("libprofiler.so.0")
> In [8]: profiler.ProfilerStart("some-file-name-here")
> Out[8]: 1
> In [9]: # do stuff here
> In [10]: profiler.ProfilerStop()
> PROFILE: interrupts/evictions/bytes = 2/0/592
> Out[10]: 46
> Then all the pprof analysis tools are available as described on that webpage.
> * Please don't trust those random suggestions for possible
> improvements I threw out when writing the original description.
> Probably it's true that FP flag checking and ufunc type lookup are
> expensive, but one should fix what the profile says to fix, not what
> someone guessed might be good to fix based on a few minutes thought.
> * Instead of making a giant table of everything that needs to be done
> to make stuff fast first, before writing any code, I'd suggest picking
> one operation, figuring out what change would be the biggest
> improvement for it, making that change, checking that it worked, and
> then repeat until that operation is really fast. Then if there's still
> time pick another operation. Producing a giant todo list isn't very
> productive by itself if there's no time then to actually do all the
> things on the list :-).
> * Did you notice this line on the requirements page? "Having your
> first pull request merged before the GSoC application deadline (May 3)
> is required for your application to be accepted."
> -n
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

More information about the NumPy-Discussion mailing list