[Numpy-discussion] GSoC : Performance parity between numpy arrays and Python scalars

Charles R Harris charlesr.harris at gmail.com
Thu May 2 11:36:30 EDT 2013


On Thu, May 2, 2013 at 7:14 AM, Nathaniel Smith <njs at pobox.com> wrote:

> On Thu, May 2, 2013 at 6:26 AM, Arink Verma <arinkverma at iitrpr.ac.in>
> wrote:
> > Yes, we need to ensure that..
> > Code generator can be made, which can create code for table of registered
> > dtype during build time itself.
>
> I'd probably just generate it at run-time on an as-needed basis.
> (I.e., use the full lookup logic the first time, then save the
> result.) New dtypes can be registered, which will mean the tables need
> to change size at runtime anyway. If someone does some strange thing
> like add float16's and float64's, we can do the lookup to determine
> that this should be handled by the float64/float64 loop, and then
> store that information so that the next time it's fast (but we
> probably don't want to be calculating all combinations at build-time,
> which would require running the full type resolution machinery, esp.
> since it wouldn't really bring any benefits that I can see).
>
> * Re: the profiling, I wrote a full oprofile->callgrind format script
> years ago: http://vorpus.org/~njs/op2calltree.py
> Haven't used it in years either but neither oprofile nor kcachegrind
> are terribly fast-moving projects so it's probably still working, or
> could be made so without much work.
> Or easier is to use the gperftools CPU profiler:
> https://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html
>
> Instead of linking to it at build time, you can just use ctypes:
>
> In [7]: profiler = ctypes.CDLL("libprofiler.so.0")
>
> In [8]: profiler.ProfilerStart("some-file-name-here")
> Out[8]: 1
>
> In [9]: # do stuff here
>
> In [10]: profiler.ProfilerStop()
> PROFILE: interrupts/evictions/bytes = 2/0/592
> Out[10]: 46
>
> Then all the pprof analysis tools are available as described on that
> webpage.
>
> * Please don't trust those random suggestions for possible
> improvements I threw out when writing the original description.
> Probably it's true that FP flag checking and ufunc type lookup are
> expensive, but one should fix what the profile says to fix, not what
> someone guessed might be good to fix based on a few minutes thought.
>
> * Instead of making a giant table of everything that needs to be done
> to make stuff fast first, before writing any code, I'd suggest picking
> one operation, figuring out what change would be the biggest
> improvement for it, making that change, checking that it worked, and
> then repeat until that operation is really fast. Then if there's still
> time pick another operation. Producing a giant todo list isn't very
> productive by itself if there's no time then to actually do all the
> things on the list :-).
>
> * Did you notice this line on the requirements page? "Having your
> first pull request merged before the GSoC application deadline (May 3)
> is required for your application to be accepted."
>

Where is that last requirement? It seems out of line to me. Arink now has a
pull request, but it looks intrusive enough and needs enough work that I
don't think we can just put it in.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130502/e3e6aede/attachment.html>


More information about the NumPy-Discussion mailing list