
On Thu, May 2, 2013 at 6:26 AM, Arink Verma <arinkverma@iitrpr.ac.in> wrote:
Yes, we need to ensure that.. Code generator can be made, which can create code for table of registered dtype during build time itself.
I'd probably just generate it at run-time on an as-needed basis. (I.e., use the full lookup logic the first time, then save the result.) New dtypes can be registered, which will mean the tables need to change size at runtime anyway. If someone does some strange thing like add float16's and float64's, we can do the lookup to determine that this should be handled by the float64/float64 loop, and then store that information so that the next time it's fast (but we probably don't want to be calculating all combinations at build-time, which would require running the full type resolution machinery, esp. since it wouldn't really bring any benefits that I can see). * Re: the profiling, I wrote a full oprofile->callgrind format script years ago: http://vorpus.org/~njs/op2calltree.py Haven't used it in years either but neither oprofile nor kcachegrind are terribly fast-moving projects so it's probably still working, or could be made so without much work. Or easier is to use the gperftools CPU profiler: https://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html Instead of linking to it at build time, you can just use ctypes: In [7]: profiler = ctypes.CDLL("libprofiler.so.0") In [8]: profiler.ProfilerStart("some-file-name-here") Out[8]: 1 In [9]: # do stuff here In [10]: profiler.ProfilerStop() PROFILE: interrupts/evictions/bytes = 2/0/592 Out[10]: 46 Then all the pprof analysis tools are available as described on that webpage. * Please don't trust those random suggestions for possible improvements I threw out when writing the original description. Probably it's true that FP flag checking and ufunc type lookup are expensive, but one should fix what the profile says to fix, not what someone guessed might be good to fix based on a few minutes thought. * Instead of making a giant table of everything that needs to be done to make stuff fast first, before writing any code, I'd suggest picking one operation, figuring out what change would be the biggest improvement for it, making that change, checking that it worked, and then repeat until that operation is really fast. Then if there's still time pick another operation. Producing a giant todo list isn't very productive by itself if there's no time then to actually do all the things on the list :-). * Did you notice this line on the requirements page? "Having your first pull request merged before the GSoC application deadline (May 3) is required for your application to be accepted." -n