
Hi Armin, Thanks for your feedback. We ran one of the program suggested by you as an example for evaluation: cd rpython/jit/tl non-pgo-pypy ../../bin/rpython -O2 --source targettlr pgo-pypy ../../bin/rpython -O2 --source targettlr We got the following results : Non-Pgo pypy - [Timer] Timings: [Timer] annotate --- 7.5 s [Timer] rtype_lltype --- 5.8 s [Timer] backendopt_lltype --- 3.6 s [Timer] stackcheckinsertion_lltype --- 0.1 s [Timer] database_c --- 19.6 s [Timer] source_c --- 2.6 s [Timer] ========================================= [Timer] Total: --- 39.2 s PGO-pypy : [Timer] Timings: [Timer] annotate --- 7.6 s [Timer] rtype_lltype --- 5.1 s [Timer] backendopt_lltype --- 3.1 s [Timer] stackcheckinsertion_lltype --- 0.0 s [Timer] database_c --- 18.5 s [Timer] source_c --- 2.3 s [Timer] ========================================= [Timer] Total: --- 36.6 s The delta in performance between these two is about 8%. We are working on getting the data to identify the % of interpreted code vs the jited code for both the binaries. We are also working on creating a pull request to get a better feedback on the change. Regards Yash ________________________________________ From: Armin Rigo [armin.rigo@gmail.com] Sent: Wednesday, November 02, 2016 2:18 AM To: Singh, Yashwardhan Cc: pypy-dev@python.org Subject: Re: [pypy-dev] PGO Optimized Binary Hi, On 31 October 2016 at 22:28, Singh, Yashwardhan <yashwardhan.singh@intel.com> wrote:
We applied compiler assisted optimization technique called PGO or Profile Guided Optimization while building PyPy, and found performance got improved by up to 22.4% on the Grand Unified Python Benchmark (GUPB) from “hg clone https://hg.python.org/benchmarks”. The below result table shows majority of 51 micros got performance boost with 8 got performance regression.
The kind of performance improvement you are measuring involves only short- or very short-running programs. A few years ago we'd have shrugged it off as irrelevant---"please modify the benchmarks so that they run for at least 10 seconds, more if they are larger"---because the JIT compiler doesn't have a chance to warm up. But we'd also have shrugged off your whole attempt---"PGO optimization cannot change anything to the speed of JIT-produced machine code". Nowadays we tend to look more seriously at the cold or warming-up performance too, or at least we know that we should look there. There are (stalled) plans of setting up a second benchmark suite for PyPy which focuses on this. You can get an estimate of whether you're looking at cold or hot code: compare the timings with CPython. Also, you can set the environment variable ``PYPYLOG=jit-summary:-`` and look at the first 2 lines to see how much time was spent warming up the JIT (or attempting to). Note that we did enable PGO long ago, with modest benefits. We gave up when our JIT compiler became good enough. Maybe now is the time to try again (and also, PGO itself might have improved in the meantime).
We’d like to get some input on how to contribute our optimization recipe to the PyPy dev tree, perhaps by creating an item to the PyPy issue tracker?
The best would be to create a pull request so that we can look at your changes more easily.
In addition, we would also appreciate any other benchmark or real world use based workload as alternatives to evaluate this.
You can take any Python program that runs either very shortly or not faster than CPython. For a larger example (with Python 2.7): cd rpython/jit/tl python ../../bin/rpython -O2 --source targettlr # 24 secs pypy ../../bin/rpython -O2 --source targettlr # 39 secs A bientôt, Armin.