GSoC : Performance parity between numpy arrays and Python scalars

Hi all! I have written my application[1] for *Performance parity between numpy arrays and Python scalars[2]. *It would be a great help if you view it. Does it look achievable and deliverable according to the project. [1] http://www.google-melange.com/gsoc/proposal/review/google/gsoc2013/arinkverm... [2] http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas -- Arink Computer Science and Engineering Indian Institute of Technology Ropar www.arinkverma.in


@Raul I will pull new version, and try to include that also. What is wrong with macros for inline function? Yes, time for ufunc is reduced to almost half, for lookup table, I am generating key from argument type and returning the appropriated value.[1] @Chuck Yes I did some profiling with oprofiler for "python -m timeit -n 1000000 -s 'import numpy as np;x = np.asarray(1.0)' 'x+x'". see data sheet.[2] As every time a ufunc is invoked, the code has to check every single data type possible (bool, int, double, etc) until if finds the best match for the data that the operation is being performed on. In scalar, we can send best match, from pre-populated table. At present the implementation is not well-structured and support only addition for int+int and float+float. [1] [1] https://github.com/arinkverma/numpy/commit/e2d8de7e7b643c7a76ff92bc1219847f9... [2] https://docs.google.com/spreadsheet/ccc?key=0AnPqyp8kuQw0dG1hdjZiazE2dGtTY1J... On Thu, May 2, 2013 at 12:09 AM, Raul Cota <raul@virtualmaterials.com>wrote:
It is great that you are looking into this !! We are currently running on a fork of numpy because we really need these performance improvements .
I noticed that, as suggested, you took from the pull request I posted a while ago for the PyObject_GetAttrString PyObject_GetBuffer
issues.
( https://github.com/raulcota/numpy )
A couple of comments on that,
- Seems like you did not grab the latest revisions of that code that I posted that fixes the style of the comments and 'attempts' to fix an issue reported about Python 3 . I say 'attempts' because I thought it was fixed but I someone mentioned this was not correct.
- There was also some feedback from Nathaniel about not liking the macros and siding for inline functions. I have not gotten around to it, but it would be nice if you jump on that boat.
On the has lookup table, haven't looked at the implementation but the speed up is remarkable.
Cheers !
Raul
On 30/04/2013 8:26 PM, Arink Verma wrote:
Hi all! I have written my application[1] for *Performance parity between numpy arrays and Python scalars[2]. *It would be a great help if you view it. Does it look achievable and deliverable according to the project.
[1] http://www.google-melange.com/gsoc/proposal/review/google/gsoc2013/arinkverm... [2] http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas
-- Arink Computer Science and Engineering Indian Institute of Technology Ropar www.arinkverma.in
_______________________________________________ NumPy-Discussion mailing listNumPy-Discussion@scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Arink Computer Science and Engineering Indian Institute of Technology Ropar www.arinkverma.in

On Thu, May 2, 2013 at 5:25 AM, Arink Verma <arinkverma@iitrpr.ac.in> wrote:
@Raul I will pull new version, and try to include that also. What is wrong with macros for inline function? Yes, time for ufunc is reduced to almost half, for lookup table, I am generating key from argument type and returning the appropriated value.[1]
@Chuck Yes I did some profiling with oprofiler for "python -m timeit -n 1000000 -s 'import numpy as np;x = np.asarray(1.0)' 'x+x'". see data sheet.[2]
As every time a ufunc is invoked, the code has to check every single data type possible (bool, int, double, etc) until if finds the best match for the data that the operation is being performed on. In scalar, we can send best match, from pre-populated table. At present the implementation is not well-structured and support only addition for int+int and float+float. [1]
You are pointing out something that may well be the main difficulty: the code there is messy, and we need to ensure that optimisations don't preclude later extensions (especially with regard to new dtype addition). David

Yes, we need to ensure that.. Code generator can be made, which can create code for table of registered dtype during build time itself. Also at present there lot of duplicate code that attempts to work around these slow paths, simplification of that code is also required. On Thu, May 2, 2013 at 10:12 AM, David Cournapeau <cournape@gmail.com>wrote:
@Raul I will pull new version, and try to include that also. What is wrong with macros for inline function? Yes, time for ufunc is reduced to almost half, for lookup table, I am generating key from argument type and returning the appropriated value.[1]
@Chuck Yes I did some profiling with oprofiler for "python -m timeit -n 1000000 -s 'import numpy as np;x = np.asarray(1.0)' 'x+x'". see data sheet.[2]
As every time a ufunc is invoked, the code has to check every single data type possible (bool, int, double, etc) until if finds the best match for
On Thu, May 2, 2013 at 5:25 AM, Arink Verma <arinkverma@iitrpr.ac.in> wrote: the
data that the operation is being performed on. In scalar, we can send best match, from pre-populated table. At present the implementation is not well-structured and support only addition for int+int and float+float. [1]
You are pointing out something that may well be the main difficulty: the code there is messy, and we need to ensure that optimisations don't preclude later extensions (especially with regard to new dtype addition).
David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Arink Computer Science and Engineering Indian Institute of Technology Ropar www.arinkverma.in

On Thu, May 2, 2013 at 11:26 AM, Arink Verma <arinkverma@iitrpr.ac.in> wrote:
Yes, we need to ensure that.. Code generator can be made, which can create code for table of registered dtype during build time itself.
So dtypes can be registered at runtime as well. In an ideal world, 'native' numpy types would not be special cases. This is too big for a GSoC, but we should make sure we don't make it worse.
Also at present there lot of duplicate code that attempts to work around these slow paths, simplification of that code is also required.
That there is room for consolidation would be an understatement :) David
On Thu, May 2, 2013 at 10:12 AM, David Cournapeau <cournape@gmail.com> wrote:
On Thu, May 2, 2013 at 5:25 AM, Arink Verma <arinkverma@iitrpr.ac.in> wrote:
@Raul I will pull new version, and try to include that also. What is wrong with macros for inline function? Yes, time for ufunc is reduced to almost half, for lookup table, I am generating key from argument type and returning the appropriated value.[1]
@Chuck Yes I did some profiling with oprofiler for "python -m timeit -n 1000000 -s 'import numpy as np;x = np.asarray(1.0)' 'x+x'". see data sheet.[2]
As every time a ufunc is invoked, the code has to check every single data type possible (bool, int, double, etc) until if finds the best match for the data that the operation is being performed on. In scalar, we can send best match, from pre-populated table. At present the implementation is not well-structured and support only addition for int+int and float+float. [1]
You are pointing out something that may well be the main difficulty: the code there is messy, and we need to ensure that optimisations don't preclude later extensions (especially with regard to new dtype addition).
David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Arink Computer Science and Engineering Indian Institute of Technology Ropar www.arinkverma.in
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, May 2, 2013 at 6:26 AM, Arink Verma <arinkverma@iitrpr.ac.in> wrote:
Yes, we need to ensure that.. Code generator can be made, which can create code for table of registered dtype during build time itself.
I'd probably just generate it at run-time on an as-needed basis. (I.e., use the full lookup logic the first time, then save the result.) New dtypes can be registered, which will mean the tables need to change size at runtime anyway. If someone does some strange thing like add float16's and float64's, we can do the lookup to determine that this should be handled by the float64/float64 loop, and then store that information so that the next time it's fast (but we probably don't want to be calculating all combinations at build-time, which would require running the full type resolution machinery, esp. since it wouldn't really bring any benefits that I can see). * Re: the profiling, I wrote a full oprofile->callgrind format script years ago: http://vorpus.org/~njs/op2calltree.py Haven't used it in years either but neither oprofile nor kcachegrind are terribly fast-moving projects so it's probably still working, or could be made so without much work. Or easier is to use the gperftools CPU profiler: https://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html Instead of linking to it at build time, you can just use ctypes: In [7]: profiler = ctypes.CDLL("libprofiler.so.0") In [8]: profiler.ProfilerStart("some-file-name-here") Out[8]: 1 In [9]: # do stuff here In [10]: profiler.ProfilerStop() PROFILE: interrupts/evictions/bytes = 2/0/592 Out[10]: 46 Then all the pprof analysis tools are available as described on that webpage. * Please don't trust those random suggestions for possible improvements I threw out when writing the original description. Probably it's true that FP flag checking and ufunc type lookup are expensive, but one should fix what the profile says to fix, not what someone guessed might be good to fix based on a few minutes thought. * Instead of making a giant table of everything that needs to be done to make stuff fast first, before writing any code, I'd suggest picking one operation, figuring out what change would be the biggest improvement for it, making that change, checking that it worked, and then repeat until that operation is really fast. Then if there's still time pick another operation. Producing a giant todo list isn't very productive by itself if there's no time then to actually do all the things on the list :-). * Did you notice this line on the requirements page? "Having your first pull request merged before the GSoC application deadline (May 3) is required for your application to be accepted." -n

Updating table at runtime, seems a good option. But then we have maintain separate file for caching and storing. I will see, op2calltree.py <http://vorpus.org/~njs/op2calltree.py> and gperftools both.
* Instead of making a giant table of everything that needs to be done to make stuff fast first, before writing any code, I'd suggest picking one operation, figuring out what change would be the biggest improvement for it, making that change, checking that it worked, and then repeat until that operation is really fast. Working like that only, firstly optimizing sum operation specifically for int scalar then will move to other.
* Did you notice this line on the requirements page? "Having your first pull request merged before the GSoC application deadline (May 3) is required for your application to be accepted." Thanks for reminding! I was too busy with my university exams, I forgot to do that. Does the merge has to be related to gsoc project, or any other improvement can be consider?
On Thu, May 2, 2013 at 6:44 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, May 2, 2013 at 6:26 AM, Arink Verma <arinkverma@iitrpr.ac.in> wrote:
Yes, we need to ensure that.. Code generator can be made, which can create code for table of registered dtype during build time itself.
I'd probably just generate it at run-time on an as-needed basis. (I.e., use the full lookup logic the first time, then save the result.) New dtypes can be registered, which will mean the tables need to change size at runtime anyway. If someone does some strange thing like add float16's and float64's, we can do the lookup to determine that this should be handled by the float64/float64 loop, and then store that information so that the next time it's fast (but we probably don't want to be calculating all combinations at build-time, which would require running the full type resolution machinery, esp. since it wouldn't really bring any benefits that I can see).
* Re: the profiling, I wrote a full oprofile->callgrind format script years ago: http://vorpus.org/~njs/op2calltree.py Haven't used it in years either but neither oprofile nor kcachegrind are terribly fast-moving projects so it's probably still working, or could be made so without much work. Or easier is to use the gperftools CPU profiler: https://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html
Instead of linking to it at build time, you can just use ctypes:
In [7]: profiler = ctypes.CDLL("libprofiler.so.0")
In [8]: profiler.ProfilerStart("some-file-name-here") Out[8]: 1
In [9]: # do stuff here
In [10]: profiler.ProfilerStop() PROFILE: interrupts/evictions/bytes = 2/0/592 Out[10]: 46
Then all the pprof analysis tools are available as described on that webpage.
* Please don't trust those random suggestions for possible improvements I threw out when writing the original description. Probably it's true that FP flag checking and ufunc type lookup are expensive, but one should fix what the profile says to fix, not what someone guessed might be good to fix based on a few minutes thought.
* Instead of making a giant table of everything that needs to be done to make stuff fast first, before writing any code, I'd suggest picking one operation, figuring out what change would be the biggest improvement for it, making that change, checking that it worked, and then repeat until that operation is really fast. Then if there's still time pick another operation. Producing a giant todo list isn't very productive by itself if there's no time then to actually do all the things on the list :-).
* Did you notice this line on the requirements page? "Having your first pull request merged before the GSoC application deadline (May 3) is required for your application to be accepted."
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Arink Computer Science and Engineering Indian Institute of Technology Ropar www.arinkverma.in

On Thu, May 2, 2013 at 7:14 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Thu, May 2, 2013 at 6:26 AM, Arink Verma <arinkverma@iitrpr.ac.in> wrote:
Yes, we need to ensure that.. Code generator can be made, which can create code for table of registered dtype during build time itself.
I'd probably just generate it at run-time on an as-needed basis. (I.e., use the full lookup logic the first time, then save the result.) New dtypes can be registered, which will mean the tables need to change size at runtime anyway. If someone does some strange thing like add float16's and float64's, we can do the lookup to determine that this should be handled by the float64/float64 loop, and then store that information so that the next time it's fast (but we probably don't want to be calculating all combinations at build-time, which would require running the full type resolution machinery, esp. since it wouldn't really bring any benefits that I can see).
* Re: the profiling, I wrote a full oprofile->callgrind format script years ago: http://vorpus.org/~njs/op2calltree.py Haven't used it in years either but neither oprofile nor kcachegrind are terribly fast-moving projects so it's probably still working, or could be made so without much work. Or easier is to use the gperftools CPU profiler: https://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html
Instead of linking to it at build time, you can just use ctypes:
In [7]: profiler = ctypes.CDLL("libprofiler.so.0")
In [8]: profiler.ProfilerStart("some-file-name-here") Out[8]: 1
In [9]: # do stuff here
In [10]: profiler.ProfilerStop() PROFILE: interrupts/evictions/bytes = 2/0/592 Out[10]: 46
Then all the pprof analysis tools are available as described on that webpage.
* Please don't trust those random suggestions for possible improvements I threw out when writing the original description. Probably it's true that FP flag checking and ufunc type lookup are expensive, but one should fix what the profile says to fix, not what someone guessed might be good to fix based on a few minutes thought.
* Instead of making a giant table of everything that needs to be done to make stuff fast first, before writing any code, I'd suggest picking one operation, figuring out what change would be the biggest improvement for it, making that change, checking that it worked, and then repeat until that operation is really fast. Then if there's still time pick another operation. Producing a giant todo list isn't very productive by itself if there's no time then to actually do all the things on the list :-).
* Did you notice this line on the requirements page? "Having your first pull request merged before the GSoC application deadline (May 3) is required for your application to be accepted."
Where is that last requirement? It seems out of line to me. Arink now has a pull request, but it looks intrusive enough and needs enough work that I don't think we can just put it in. Chuck

Charles R Harris <charlesr.harris <at> gmail.com> writes: [clip]
* Did you notice this line on the requirements page? "Having your first pull request merged before the GSoC application deadline (May 3) is required for your application to be accepted."
Where is that last requirement? It seems out of line to me. Arink now has a pull request, but it looks intrusive enough and needs enough work that I don't think we can just put it in.
Well, we wrote so here: http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas but that's maybe just a mistake -- PSF states exactly the opposite: http://wiki.python.org/moin/SummerOfCode/ApplicationTemplate2013 -- Pauli Virtanen

On Thu, May 2, 2013 at 6:45 PM, Pauli Virtanen <pav@iki.fi> wrote:
Charles R Harris <charlesr.harris <at> gmail.com> writes: [clip]
* Did you notice this line on the requirements page? "Having your first pull request merged before the GSoC application deadline (May 3) is required for your application to be accepted."
Where is that last requirement? It seems out of line to me. Arink now has a pull request, but it looks intrusive enough and needs enough work that I don't think we can just put it in.
Well, we wrote so here:
http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas
but that's maybe just a mistake -- PSF states exactly the opposite:
http://wiki.python.org/moin/SummerOfCode/ApplicationTemplate2013
It wasn't a mistake - the part of a PR process that is most interesting in the context of evaluating GSoC applications is the dialogue and how the submitter deals with feedback. I forgot to add on that page (although I think it was in one of my emails) that the patch shouldn't be completely trivial - fixing a typo doesn't really tell us all that much. But in this case Chuck's suggestion on the PR of how to get something merged looks fine. Cheers, Ralf

On Thu, May 2, 2013 at 11:49 AM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
On Thu, May 2, 2013 at 6:45 PM, Pauli Virtanen <pav@iki.fi> wrote:
Charles R Harris <charlesr.harris <at> gmail.com> writes: [clip]
* Did you notice this line on the requirements page? "Having your first pull request merged before the GSoC application deadline (May 3) is required for your application to be accepted."
Where is that last requirement? It seems out of line to me. Arink now has a pull request, but it looks intrusive enough and needs enough work that I don't think we can just put it in.
Well, we wrote so here:
http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas
but that's maybe just a mistake -- PSF states exactly the opposite:
http://wiki.python.org/moin/SummerOfCode/ApplicationTemplate2013
It wasn't a mistake - the part of a PR process that is most interesting in the context of evaluating GSoC applications is the dialogue and how the submitter deals with feedback.
I forgot to add on that page (although I think it was in one of my emails) that the patch shouldn't be completely trivial - fixing a typo doesn't really tell us all that much. But in this case Chuck's suggestion on the PR of how to get something merged looks fine.
My feeling is that learning to work with the community is part of the process after acceptance and one of the reasons there are mentors. You might get some bad choices skipping the submission/acceptance bit, but you might also close the door on people who are new to the whole thing. Ideally, the applicants would already have involved themselves with the community, practically that may often not be ths case. Chuck

On Thu, May 2, 2013 at 9:54 PM, Charles R Harris <charlesr.harris@gmail.com>wrote:
On Thu, May 2, 2013 at 11:49 AM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
On Thu, May 2, 2013 at 6:45 PM, Pauli Virtanen <pav@iki.fi> wrote:
Charles R Harris <charlesr.harris <at> gmail.com> writes: [clip]
* Did you notice this line on the requirements page? "Having your first pull request merged before the GSoC application deadline (May 3) is required for your application to be accepted."
Where is that last requirement? It seems out of line to me. Arink now has a pull request, but it looks intrusive enough and needs enough work that I don't think we can just put it in.
Well, we wrote so here:
http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas
but that's maybe just a mistake -- PSF states exactly the opposite:
http://wiki.python.org/moin/SummerOfCode/ApplicationTemplate2013
It wasn't a mistake - the part of a PR process that is most interesting in the context of evaluating GSoC applications is the dialogue and how the submitter deals with feedback.
I forgot to add on that page (although I think it was in one of my emails) that the patch shouldn't be completely trivial - fixing a typo doesn't really tell us all that much. But in this case Chuck's suggestion on the PR of how to get something merged looks fine.
My feeling is that learning to work with the community is part of the process after acceptance and one of the reasons there are mentors. You might get some bad choices skipping the submission/acceptance bit, but you might also close the door on people who are new to the whole thing. Ideally, the applicants would already have involved themselves with the community, practically that may often not be ths case.
You may be right in all of that, but since there's a good chance that there are more applicants than slots I'd rather not make those bad choices if they're acceptable. Right now we have three solid proposals, from Arink, Blake and Surya. If we're lucky we'll get three slots, but if not then we'll have a tough choice to make. The application deadline is tomorrow, so now is the time for final tweaks to the proposals. After that of course the plan can still be worked out more, but it can't be edited on Melange anymore. Ralf

On Fri, May 3, 2013 at 12:29 AM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
On Thu, May 2, 2013 at 9:54 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Thu, May 2, 2013 at 11:49 AM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
On Thu, May 2, 2013 at 6:45 PM, Pauli Virtanen <pav@iki.fi> wrote:
Charles R Harris <charlesr.harris <at> gmail.com> writes: [clip]
* Did you notice this line on the requirements page? "Having your first pull request merged before the GSoC application deadline (May 3) is required for your application to be accepted."
Where is that last requirement? It seems out of line to me. Arink now has a pull request, but it looks intrusive enough and needs enough work that I don't think we can just put it in.
Well, we wrote so here:
http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas
but that's maybe just a mistake -- PSF states exactly the opposite:
http://wiki.python.org/moin/SummerOfCode/ApplicationTemplate2013
It wasn't a mistake - the part of a PR process that is most interesting in the context of evaluating GSoC applications is the dialogue and how the submitter deals with feedback.
I forgot to add on that page (although I think it was in one of my emails) that the patch shouldn't be completely trivial - fixing a typo doesn't really tell us all that much. But in this case Chuck's suggestion on the PR of how to get something merged looks fine.
My feeling is that learning to work with the community is part of the process after acceptance and one of the reasons there are mentors. You might get some bad choices skipping the submission/acceptance bit, but you might also close the door on people who are new to the whole thing. Ideally, the applicants would already have involved themselves with the community, practically that may often not be ths case.
You may be right in all of that, but since there's a good chance that there are more applicants than slots I'd rather not make those bad choices if they're acceptable.
acceptable --> avoidable
Right now we have three solid proposals, from Arink, Blake and Surya. If we're lucky we'll get three slots, but if not then we'll have a tough choice to make.
The application deadline is tomorrow, so now is the time for final tweaks to the proposals. After that of course the plan can still be worked out more, but it can't be edited on Melange anymore.
Ralf

On Thu, May 2, 2013 at 6:30 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, May 3, 2013 at 12:29 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, May 2, 2013 at 9:54 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, May 2, 2013 at 11:49 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, May 2, 2013 at 6:45 PM, Pauli Virtanen <pav@iki.fi> wrote:
Charles R Harris <charlesr.harris <at> gmail.com> writes: [clip]
* Did you notice this line on the requirements page? "Having your first pull request merged before the GSoC application deadline (May 3) is required for your application to be accepted."
Where is that last requirement? It seems out of line to me. Arink now has a pull request, but it looks intrusive enough and needs enough work that I don't think we can just put it in.
Well, we wrote so here:
http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas
but that's maybe just a mistake -- PSF states exactly the opposite:
http://wiki.python.org/moin/SummerOfCode/ApplicationTemplate2013
It wasn't a mistake - the part of a PR process that is most interesting in the context of evaluating GSoC applications is the dialogue and how the submitter deals with feedback.
I forgot to add on that page (although I think it was in one of my emails) that the patch shouldn't be completely trivial - fixing a typo doesn't really tell us all that much. But in this case Chuck's suggestion on the PR of how to get something merged looks fine.
My feeling is that learning to work with the community is part of the process after acceptance and one of the reasons there are mentors. You might get some bad choices skipping the submission/acceptance bit, but you might also close the door on people who are new to the whole thing. Ideally, the applicants would already have involved themselves with the community, practically that may often not be ths case.
You may be right in all of that, but since there's a good chance that there are more applicants than slots I'd rather not make those bad choices if they're acceptable.
acceptable --> avoidable
Right now we have three solid proposals, from Arink, Blake and Surya. If we're lucky we'll get three slots, but if not then we'll have a tough choice to make.
The application deadline is tomorrow, so now is the time for final tweaks to the proposals. After that of course the plan can still be worked out more, but it can't be edited on Melange anymore.
Terri can still make it editable on Melange if necessary. Josef
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, May 2, 2013 at 6:47 PM, <josef.pktd@gmail.com> wrote:
On Thu, May 2, 2013 at 6:30 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, May 3, 2013 at 12:29 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, May 2, 2013 at 9:54 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, May 2, 2013 at 11:49 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, May 2, 2013 at 6:45 PM, Pauli Virtanen <pav@iki.fi> wrote:
Charles R Harris <charlesr.harris <at> gmail.com> writes: [clip] > * Did you notice this line on the requirements page? "Having your > first pull request merged before the GSoC application deadline (May > 3) > is required for your application to be accepted." > > Where is that last requirement? It seems out of line to me. > Arink now has a pull request, but it looks intrusive enough > and needs enough work that I don't think we can just put it in.
Well, we wrote so here:
http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas
but that's maybe just a mistake -- PSF states exactly the opposite:
http://wiki.python.org/moin/SummerOfCode/ApplicationTemplate2013
It wasn't a mistake - the part of a PR process that is most
in the context of evaluating GSoC applications is the dialogue and how the submitter deals with feedback.
I forgot to add on that page (although I think it was in one of my emails) that the patch shouldn't be completely trivial - fixing a typo doesn't really tell us all that much. But in this case Chuck's suggestion on the PR of how to get something merged looks fine.
My feeling is that learning to work with the community is part of the process after acceptance and one of the reasons there are mentors. You might get some bad choices skipping the submission/acceptance bit, but you might also close the door on people who are new to the whole thing. Ideally,
interesting the
applicants would already have involved themselves with the community, practically that may often not be ths case.
You may be right in all of that, but since there's a good chance that there are more applicants than slots I'd rather not make those bad choices if they're acceptable.
acceptable --> avoidable
Right now we have three solid proposals, from Arink, Blake and Surya. If we're lucky we'll get three slots, but if not then we'll have a tough
choice
to make.
The application deadline is tomorrow, so now is the time for final tweaks to the proposals. After that of course the plan can still be worked out more, but it can't be edited on Melange anymore.
Terri can still make it editable on Melange if necessary.
Arink, you still have work to do for a PR. Chuck.

I hardly found, any thing to improve and correct.. not even typo in docs? Where we need to avoid the version checks? On Fri, May 3, 2013 at 10:52 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Thu, May 2, 2013 at 6:47 PM, <josef.pktd@gmail.com> wrote:
On Thu, May 2, 2013 at 6:30 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, May 3, 2013 at 12:29 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, May 2, 2013 at 9:54 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, May 2, 2013 at 11:49 AM, Ralf Gommers <ralf.gommers@gmail.com
wrote:
On Thu, May 2, 2013 at 6:45 PM, Pauli Virtanen <pav@iki.fi> wrote: > > Charles R Harris <charlesr.harris <at> gmail.com> writes: > [clip] > > * Did you notice this line on the requirements page? "Having your > > first pull request merged before the GSoC application deadline
(May
> > 3) > > is required for your application to be accepted." > > > > Where is that last requirement? It seems out of line to me. > > Arink now has a pull request, but it looks intrusive enough > > and needs enough work that I don't think we can just put it in. > > Well, we wrote so here: > > http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas > > but that's maybe just a mistake -- PSF states exactly the opposite: > > http://wiki.python.org/moin/SummerOfCode/ApplicationTemplate2013
It wasn't a mistake - the part of a PR process that is most interesting in the context of evaluating GSoC applications is the dialogue and how the submitter deals with feedback.
I forgot to add on that page (although I think it was in one of my emails) that the patch shouldn't be completely trivial - fixing a typo doesn't really tell us all that much. But in this case Chuck's suggestion on the PR of how to get something merged looks fine.
My feeling is that learning to work with the community is part of the process after acceptance and one of the reasons there are mentors. You might get some bad choices skipping the submission/acceptance bit, but you might also close the door on people who are new to the whole thing. Ideally, the applicants would already have involved themselves with the community, practically that may often not be ths case.
You may be right in all of that, but since there's a good chance that there are more applicants than slots I'd rather not make those bad choices if they're acceptable.
acceptable --> avoidable
Right now we have three solid proposals, from Arink, Blake and Surya.
If
we're lucky we'll get three slots, but if not then we'll have a tough choice to make.
The application deadline is tomorrow, so now is the time for final tweaks to the proposals. After that of course the plan can still be worked out more, but it can't be edited on Melange anymore.
Terri can still make it editable on Melange if necessary.
Arink, you still have work to do for a PR.
Chuck.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Arink Computer Science and Engineering Indian Institute of Technology Ropar www.arinkverma.in

I have created a new PR, have removed one irrelevant version check. https://github.com/numpy/numpy/pull/3304/files On Fri, May 3, 2013 at 11:29 PM, Arink Verma <arinkverma@iitrpr.ac.in>wrote:
I hardly found, any thing to improve and correct.. not even typo in docs? Where we need to avoid the version checks?
On Fri, May 3, 2013 at 10:52 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Thu, May 2, 2013 at 6:47 PM, <josef.pktd@gmail.com> wrote:
On Thu, May 2, 2013 at 6:30 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, May 3, 2013 at 12:29 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, May 2, 2013 at 9:54 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Thu, May 2, 2013 at 11:49 AM, Ralf Gommers <
ralf.gommers@gmail.com>
wrote: > > > > > On Thu, May 2, 2013 at 6:45 PM, Pauli Virtanen <pav@iki.fi> wrote: >> >> Charles R Harris <charlesr.harris <at> gmail.com> writes: >> [clip] >> > * Did you notice this line on the requirements page? "Having your >> > first pull request merged before the GSoC application deadline (May >> > 3) >> > is required for your application to be accepted." >> > >> > Where is that last requirement? It seems out of line to me. >> > Arink now has a pull request, but it looks intrusive enough >> > and needs enough work that I don't think we can just put it in. >> >> Well, we wrote so here: >> >> http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas >> >> but that's maybe just a mistake -- PSF states exactly the opposite: >> >> http://wiki.python.org/moin/SummerOfCode/ApplicationTemplate2013 > > > It wasn't a mistake - the part of a PR process that is most interesting > in the context of evaluating GSoC applications is the dialogue and how the > submitter deals with feedback. > > I forgot to add on that page (although I think it was in one of my > emails) that the patch shouldn't be completely trivial - fixing a typo > doesn't really tell us all that much. But in this case Chuck's suggestion on > the PR of how to get something merged looks fine. >
My feeling is that learning to work with the community is part of the process after acceptance and one of the reasons there are mentors. You might get some bad choices skipping the submission/acceptance bit, but you might also close the door on people who are new to the whole thing. Ideally, the applicants would already have involved themselves with the community, practically that may often not be ths case.
You may be right in all of that, but since there's a good chance that there are more applicants than slots I'd rather not make those bad choices if they're acceptable.
acceptable --> avoidable
Right now we have three solid proposals, from Arink, Blake and Surya.
If
we're lucky we'll get three slots, but if not then we'll have a tough choice to make.
The application deadline is tomorrow, so now is the time for final tweaks to the proposals. After that of course the plan can still be worked out more, but it can't be edited on Melange anymore.
Terri can still make it editable on Melange if necessary.
Arink, you still have work to do for a PR.
Chuck.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Arink Computer Science and Engineering Indian Institute of Technology Ropar www.arinkverma.in
-- Arink Computer Science and Engineering Indian Institute of Technology Ropar www.arinkverma.in

On Fri, May 3, 2013 at 12:13 PM, Arink Verma <arinkverma@iitrpr.ac.in>wrote:
I have created a new PR, have removed one irrelevant version check. https://github.com/numpy/numpy/pull/3304/files
I made some remarks on the PR. The convention on numpy-discussion is bottom posting so you should do that to avoid future complaints. <snip> Chuck

For the sake of completeness, I don't think I ever mentioned what I used to profile when I was working on speeding up the scalars. I used AQTime 7. It is commercial and only for Windows (as far as I know). It works great and it gave me fairly accurate timings and all sorts of visual navigation features. I do have to mock around with the numpy code every time I want to compile it to get it to play nicely with Visual Studio to generate the proper bindings for the profiler. Raul On 02/05/2013 7:14 AM, Nathaniel Smith wrote:
Yes, we need to ensure that.. Code generator can be made, which can create code for table of registered dtype during build time itself. I'd probably just generate it at run-time on an as-needed basis. (I.e., use the full lookup logic the first time, then save the result.) New dtypes can be registered, which will mean the tables need to change size at runtime anyway. If someone does some strange thing
On Thu, May 2, 2013 at 6:26 AM, Arink Verma <arinkverma@iitrpr.ac.in> wrote: like add float16's and float64's, we can do the lookup to determine that this should be handled by the float64/float64 loop, and then store that information so that the next time it's fast (but we probably don't want to be calculating all combinations at build-time, which would require running the full type resolution machinery, esp. since it wouldn't really bring any benefits that I can see).
* Re: the profiling, I wrote a full oprofile->callgrind format script years ago: http://vorpus.org/~njs/op2calltree.py Haven't used it in years either but neither oprofile nor kcachegrind are terribly fast-moving projects so it's probably still working, or could be made so without much work. Or easier is to use the gperftools CPU profiler: https://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html
Instead of linking to it at build time, you can just use ctypes:
In [7]: profiler = ctypes.CDLL("libprofiler.so.0")
In [8]: profiler.ProfilerStart("some-file-name-here") Out[8]: 1
In [9]: # do stuff here
In [10]: profiler.ProfilerStop() PROFILE: interrupts/evictions/bytes = 2/0/592 Out[10]: 46
Then all the pprof analysis tools are available as described on that webpage.
* Please don't trust those random suggestions for possible improvements I threw out when writing the original description. Probably it's true that FP flag checking and ufunc type lookup are expensive, but one should fix what the profile says to fix, not what someone guessed might be good to fix based on a few minutes thought.
* Instead of making a giant table of everything that needs to be done to make stuff fast first, before writing any code, I'd suggest picking one operation, figuring out what change would be the biggest improvement for it, making that change, checking that it worked, and then repeat until that operation is really fast. Then if there's still time pick another operation. Producing a giant todo list isn't very productive by itself if there's no time then to actually do all the things on the list :-).
* Did you notice this line on the requirements page? "Having your first pull request merged before the GSoC application deadline (May 3) is required for your application to be accepted."
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Tue, Apr 30, 2013 at 8:26 PM, Arink Verma <arinkverma@iitrpr.ac.in>wrote:
Hi all! I have written my application[1] for *Performance parity between numpy arrays and Python scalars[2]. *It would be a great help if you view it. Does it look achievable and deliverable according to the project.
[1] http://www.google-melange.com/gsoc/proposal/review/google/gsoc2013/arinkverm... [2] http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas
Hi Arink, Have you already done some profiling? That could be tricky at the C level. I'm also curious about the hash table, what gets hashed and where do you get the improved efficiency? Admittedly, the way in which ufuncs currently detect scalars is a bit heavy weight and a fast path for certain inputs values could help. Is that what you are doing? As to the schedule, I suspect that it may be a bit ambitious but I don't see that as fatal by any means. Identifying bottlenecks and experimenting with solutions would be useful work. Chuck
participants (8)
-
Arink Verma
-
Charles R Harris
-
David Cournapeau
-
josef.pktd@gmail.com
-
Nathaniel Smith
-
Pauli Virtanen
-
Ralf Gommers
-
Raul Cota