[Numpy-discussion] GSoC : Performance parity between numpy arrays and Python scalars
raul at virtualmaterials.com
Thu May 2 12:27:09 EDT 2013
For the sake of completeness, I don't think I ever mentioned what I used
to profile when I was working on speeding up the scalars. I used AQTime
7. It is commercial and only for Windows (as far as I know). It works
great and it gave me fairly accurate timings and all sorts of visual
navigation features. I do have to mock around with the numpy code every
time I want to compile it to get it to play nicely with Visual Studio to
generate the proper bindings for the profiler.
On 02/05/2013 7:14 AM, Nathaniel Smith wrote:
> On Thu, May 2, 2013 at 6:26 AM, Arink Verma <arinkverma at iitrpr.ac.in> wrote:
>> Yes, we need to ensure that..
>> Code generator can be made, which can create code for table of registered
>> dtype during build time itself.
> I'd probably just generate it at run-time on an as-needed basis.
> (I.e., use the full lookup logic the first time, then save the
> result.) New dtypes can be registered, which will mean the tables need
> to change size at runtime anyway. If someone does some strange thing
> like add float16's and float64's, we can do the lookup to determine
> that this should be handled by the float64/float64 loop, and then
> store that information so that the next time it's fast (but we
> probably don't want to be calculating all combinations at build-time,
> which would require running the full type resolution machinery, esp.
> since it wouldn't really bring any benefits that I can see).
> * Re: the profiling, I wrote a full oprofile->callgrind format script
> years ago: http://vorpus.org/~njs/op2calltree.py
> Haven't used it in years either but neither oprofile nor kcachegrind
> are terribly fast-moving projects so it's probably still working, or
> could be made so without much work.
> Or easier is to use the gperftools CPU profiler:
> Instead of linking to it at build time, you can just use ctypes:
> In : profiler = ctypes.CDLL("libprofiler.so.0")
> In : profiler.ProfilerStart("some-file-name-here")
> Out: 1
> In : # do stuff here
> In : profiler.ProfilerStop()
> PROFILE: interrupts/evictions/bytes = 2/0/592
> Out: 46
> Then all the pprof analysis tools are available as described on that webpage.
> * Please don't trust those random suggestions for possible
> improvements I threw out when writing the original description.
> Probably it's true that FP flag checking and ufunc type lookup are
> expensive, but one should fix what the profile says to fix, not what
> someone guessed might be good to fix based on a few minutes thought.
> * Instead of making a giant table of everything that needs to be done
> to make stuff fast first, before writing any code, I'd suggest picking
> one operation, figuring out what change would be the biggest
> improvement for it, making that change, checking that it worked, and
> then repeat until that operation is really fast. Then if there's still
> time pick another operation. Producing a giant todo list isn't very
> productive by itself if there's no time then to actually do all the
> things on the list :-).
> * Did you notice this line on the requirements page? "Having your
> first pull request merged before the GSoC application deadline (May 3)
> is required for your application to be accepted."
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
More information about the NumPy-Discussion