Updating table at runtime, seems a good option. But then we have maintain separate file for caching and storing.

I will see, op2calltree.py and  gperftools  both.

>* Instead of making a giant table of everything that needs to be done

>to make stuff fast first, before writing any code, I'd suggest picking
>one operation, figuring out what change would be the biggest
>improvement for it, making that change, checking that it worked, and
>then repeat until that operation is really fast.
Working like that only, firstly optimizing sum operation specifically for int scalar then will move to other.


>* Did you notice this line on the requirements page? "Having your
>first pull request merged before the GSoC application deadline (May 3)
>is required for your application to be accepted."
Thanks for reminding!
I was too busy with my university exams, I forgot to do that.
Does the merge has to be related to gsoc project, or any other improvement can be consider?



On Thu, May 2, 2013 at 6:44 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, May 2, 2013 at 6:26 AM, Arink Verma <arinkverma@iitrpr.ac.in> wrote:
> Yes, we need to ensure that..
> Code generator can be made, which can create code for table of registered
> dtype during build time itself.

I'd probably just generate it at run-time on an as-needed basis.
(I.e., use the full lookup logic the first time, then save the
result.) New dtypes can be registered, which will mean the tables need
to change size at runtime anyway. If someone does some strange thing
like add float16's and float64's, we can do the lookup to determine
that this should be handled by the float64/float64 loop, and then
store that information so that the next time it's fast (but we
probably don't want to be calculating all combinations at build-time,
which would require running the full type resolution machinery, esp.
since it wouldn't really bring any benefits that I can see).

* Re: the profiling, I wrote a full oprofile->callgrind format script
years ago: http://vorpus.org/~njs/op2calltree.py
Haven't used it in years either but neither oprofile nor kcachegrind
are terribly fast-moving projects so it's probably still working, or
could be made so without much work.
Or easier is to use the gperftools CPU profiler:
https://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html

Instead of linking to it at build time, you can just use ctypes:

In [7]: profiler = ctypes.CDLL("libprofiler.so.0")

In [8]: profiler.ProfilerStart("some-file-name-here")
Out[8]: 1

In [9]: # do stuff here

In [10]: profiler.ProfilerStop()
PROFILE: interrupts/evictions/bytes = 2/0/592
Out[10]: 46

Then all the pprof analysis tools are available as described on that webpage.

* Please don't trust those random suggestions for possible
improvements I threw out when writing the original description.
Probably it's true that FP flag checking and ufunc type lookup are
expensive, but one should fix what the profile says to fix, not what
someone guessed might be good to fix based on a few minutes thought.

* Instead of making a giant table of everything that needs to be done
to make stuff fast first, before writing any code, I'd suggest picking
one operation, figuring out what change would be the biggest
improvement for it, making that change, checking that it worked, and
then repeat until that operation is really fast. Then if there's still
time pick another operation. Producing a giant todo list isn't very
productive by itself if there's no time then to actually do all the
things on the list :-).

* Did you notice this line on the requirements page? "Having your
first pull request merged before the GSoC application deadline (May 3)
is required for your application to be accepted."

-n
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



--
Arink
Computer Science and Engineering
Indian Institute of Technology Ropar
www.arinkverma.in