
Attached is a patch to allow pypy to use SQRTSD rather than calling out to libc. This resulted in a 2x speedup, that scaled as the benchmark took longer. When i added another 0 to the end of the benchmark it was still a 2x speedup. benchmark results: http://paste.pocoo.org/show/378122/ I'll be happy to answer any questions about the patch. Joe

Hi, On Tue, May 3, 2011 at 1:20 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
I fixed it, it was really a bug about FINISH(ConstFloat(...)). But the patch, at least as applied now, does not have the intended effect; SQRTSD is never generated because ll_math_sqrt() is called with a residual call. Joe, can you fix it? I can try to if you don't have the time. A bientôt, Armin.

On Tue, May 3, 2011 at 10:30 AM, Armin Rigo <arigo@tunes.org> wrote:
Another thing is it would be useful to test that the SQRTSD is generated, I don't know a good way to do that though. Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero

Hi Alex, On Tue, May 3, 2011 at 6:23 PM, Alex Gaynor <alex.gaynor@gmail.com> wrote:
Another thing is it would be useful to test that the SQRTSD is generated, I don't know a good way to do that though.
In the test, you can hack genop_math_sqrt() in-place to set a flag, and test that the flag is set... And another place it should also be tested is in test_pypy_c_new.py (pypy.module.pypyjit.test_pypy_c), i.e. you should write a test that checks that in a full translated pypy-c, the code corresponding to the Python source "math.sqrt(x)" indeed ends up being a CALL(sqrt_wrapper) and not a CALL(ll_math_sqrt). A bientôt, Armin.

Attached an updated patch. It actually generates the assembly now to use SQRTSD. It provides about the same performance as the previous patch. Because of that, I feel as though the performance gain can be attributed to the change in error checking. It might be worth splitting out the other math functions and slimming down the error checking. Right now in the new_unary_math_function it generates a math function that does 3 C library calls 2 errno calls and the math function itself. The patch takes this down to the minimal number of checks necessary for math.sqrt, making only an isfinite() call in addition to the sqrt() call best case. Previously it'd call "_error_reset(); c_func(); rposix.get_errno()" and then at least 1 call to isnan() and a call to either isnan() or isinf(). Does this seem like something worth looking into for the other math functions? It wouldn't be as hard as getting the whole assembly instruction working, and seems like it provides the majority of the performance gain. Joe On Tue, May 3, 2011 at 12:28 PM, Armin Rigo <arigo@tunes.org> wrote:

My mistake. There was a bug so it appeared as though SQRTSD wasn't being used while running the tests. It turns out the first patch I posted probably worked. After benchmarking, SQRTSD is definitely the cause of the speedup. Sorry for the confusion. Joe On Fri, May 6, 2011 at 1:29 PM, Joe <qbproger@gmail.com> wrote:

Hi, On Tue, May 3, 2011 at 1:20 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
I fixed it, it was really a bug about FINISH(ConstFloat(...)). But the patch, at least as applied now, does not have the intended effect; SQRTSD is never generated because ll_math_sqrt() is called with a residual call. Joe, can you fix it? I can try to if you don't have the time. A bientôt, Armin.

On Tue, May 3, 2011 at 10:30 AM, Armin Rigo <arigo@tunes.org> wrote:
Another thing is it would be useful to test that the SQRTSD is generated, I don't know a good way to do that though. Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero

Hi Alex, On Tue, May 3, 2011 at 6:23 PM, Alex Gaynor <alex.gaynor@gmail.com> wrote:
Another thing is it would be useful to test that the SQRTSD is generated, I don't know a good way to do that though.
In the test, you can hack genop_math_sqrt() in-place to set a flag, and test that the flag is set... And another place it should also be tested is in test_pypy_c_new.py (pypy.module.pypyjit.test_pypy_c), i.e. you should write a test that checks that in a full translated pypy-c, the code corresponding to the Python source "math.sqrt(x)" indeed ends up being a CALL(sqrt_wrapper) and not a CALL(ll_math_sqrt). A bientôt, Armin.

Attached an updated patch. It actually generates the assembly now to use SQRTSD. It provides about the same performance as the previous patch. Because of that, I feel as though the performance gain can be attributed to the change in error checking. It might be worth splitting out the other math functions and slimming down the error checking. Right now in the new_unary_math_function it generates a math function that does 3 C library calls 2 errno calls and the math function itself. The patch takes this down to the minimal number of checks necessary for math.sqrt, making only an isfinite() call in addition to the sqrt() call best case. Previously it'd call "_error_reset(); c_func(); rposix.get_errno()" and then at least 1 call to isnan() and a call to either isnan() or isinf(). Does this seem like something worth looking into for the other math functions? It wouldn't be as hard as getting the whole assembly instruction working, and seems like it provides the majority of the performance gain. Joe On Tue, May 3, 2011 at 12:28 PM, Armin Rigo <arigo@tunes.org> wrote:

My mistake. There was a bug so it appeared as though SQRTSD wasn't being used while running the tests. It turns out the first patch I posted probably worked. After benchmarking, SQRTSD is definitely the cause of the speedup. Sorry for the confusion. Joe On Fri, May 6, 2011 at 1:29 PM, Joe <qbproger@gmail.com> wrote:

Hi Joe, On Sat, May 7, 2011 at 6:15 AM, Joe <qbproger@gmail.com> wrote:
Sorry, the confusion probably came from my side. I missed that running the tests with interp_operations() would not set supports_floats=True :-( That's fixed now. Thank you again for your patience. Armin
participants (4)
-
Alex Gaynor
-
Armin Rigo
-
Joe
-
Maciej Fijalkowski