[Numpy-discussion] GSoC : Performance parity between numpy arrays and Python scalars

Nathaniel Smith njs at pobox.com
Thu May 2 09:14:32 EDT 2013

On Thu, May 2, 2013 at 6:26 AM, Arink Verma <arinkverma at iitrpr.ac.in> wrote:
> Yes, we need to ensure that..
> Code generator can be made, which can create code for table of registered
> dtype during build time itself.

I'd probably just generate it at run-time on an as-needed basis.
(I.e., use the full lookup logic the first time, then save the
result.) New dtypes can be registered, which will mean the tables need
to change size at runtime anyway. If someone does some strange thing
like add float16's and float64's, we can do the lookup to determine
that this should be handled by the float64/float64 loop, and then
store that information so that the next time it's fast (but we
probably don't want to be calculating all combinations at build-time,
which would require running the full type resolution machinery, esp.
since it wouldn't really bring any benefits that I can see).

* Re: the profiling, I wrote a full oprofile->callgrind format script
years ago: http://vorpus.org/~njs/op2calltree.py
Haven't used it in years either but neither oprofile nor kcachegrind
are terribly fast-moving projects so it's probably still working, or
could be made so without much work.
Or easier is to use the gperftools CPU profiler:

Instead of linking to it at build time, you can just use ctypes:

In [7]: profiler = ctypes.CDLL("libprofiler.so.0")

In [8]: profiler.ProfilerStart("some-file-name-here")
Out[8]: 1

In [9]: # do stuff here

In [10]: profiler.ProfilerStop()
PROFILE: interrupts/evictions/bytes = 2/0/592
Out[10]: 46

Then all the pprof analysis tools are available as described on that webpage.

* Please don't trust those random suggestions for possible
improvements I threw out when writing the original description.
Probably it's true that FP flag checking and ufunc type lookup are
expensive, but one should fix what the profile says to fix, not what
someone guessed might be good to fix based on a few minutes thought.

* Instead of making a giant table of everything that needs to be done
to make stuff fast first, before writing any code, I'd suggest picking
one operation, figuring out what change would be the biggest
improvement for it, making that change, checking that it worked, and
then repeat until that operation is really fast. Then if there's still
time pick another operation. Producing a giant todo list isn't very
productive by itself if there's no time then to actually do all the
things on the list :-).

* Did you notice this line on the requirements page? "Having your
first pull request merged before the GSoC application deadline (May 3)
is required for your application to be accepted."


More information about the NumPy-Discussion mailing list