[pypy-dev] PGO Optimized Binary

Thu Nov 10 13:42:37 EST 2016

Hi

8% of that is very good if you can reproduce it across multiple runs
(there is a pretty high variance I would think).

You can also try running with --jit off. This gives you an indication
of the speed of interpreter, which is a part of warmup

On Wed, Nov 9, 2016 at 12:30 AM, Singh, Yashwardhan
<yashwardhan.singh at intel.com> wrote:
> Hi Armin,
>
>
> Thanks for your feedback.
> We ran one of the program suggested by you as an example for evaluation:
> cd rpython/jit/tl
> non-pgo-pypy ../../bin/rpython -O2 --source targettlr
> pgo-pypy ../../bin/rpython -O2 --source targettlr
>
> We got the following results :
> Non-Pgo pypy -
> [Timer] Timings:
> [Timer] annotate                       ---  7.5 s
> [Timer] rtype_lltype                   ---  5.8 s
> [Timer] backendopt_lltype              ---  3.6 s
> [Timer] stackcheckinsertion_lltype     ---  0.1 s
> [Timer] database_c                     --- 19.6 s
> [Timer] source_c                       ---  2.6 s
> [Timer] =========================================
> [Timer] Total:                         --- 39.2 s
>
> PGO-pypy :
> [Timer] Timings:
> [Timer] annotate                       ---  7.6 s
> [Timer] rtype_lltype                   ---  5.1 s
> [Timer] backendopt_lltype              ---  3.1 s
> [Timer] stackcheckinsertion_lltype     ---  0.0 s
> [Timer] database_c                     --- 18.5 s
> [Timer] source_c                       ---  2.3 s
> [Timer] =========================================
> [Timer] Total:                         --- 36.6 s
>
> The delta in performance  between these two is about 8%.
>
> We are working on getting the data to identify the % of interpreted code vs the jited code for both the binaries. We are also working on creating a pull request to get a better feedback on the change.
>
> Regards
> Yash
>
> ________________________________________
> From: Armin Rigo [armin.rigo at gmail.com]
> Sent: Wednesday, November 02, 2016 2:18 AM
> To: Singh, Yashwardhan
> Cc: pypy-dev at python.org
> Subject: Re: [pypy-dev] PGO Optimized Binary
>
> Hi,
>
> On 31 October 2016 at 22:28, Singh, Yashwardhan
> <yashwardhan.singh at intel.com> wrote:
>> We applied compiler assisted optimization technique called PGO or Profile Guided Optimization while building PyPy, and found performance got improved by up to 22.4% on the Grand Unified Python Benchmark (GUPB) from “hg clone https://hg.python.org/benchmarks”.  The below result table shows majority of 51 micros got performance boost with 8 got performance regression.
>
> The kind of performance improvement you are measuring involves only
> short- or very short-running programs.  A few years ago we'd have
> shrugged it off as irrelevant---"please modify the benchmarks so that
> they run for at least 10 seconds, more if they are larger"---because
> the JIT compiler doesn't have a chance to warm up.  But we'd also have
> shrugged off your whole attempt---"PGO optimization cannot change
> anything to the speed of JIT-produced machine code".
>
> Nowadays we tend to look more seriously at the cold or warming-up
> performance too, or at least we know that we should look there.  There
> are (stalled) plans of setting up a second benchmark suite for PyPy
> which focuses on this.
>
> You can get an estimate of whether you're looking at cold or hot code:
> compare the timings with CPython.  Also, you can set the environment
> variable  ``PYPYLOG=jit-summary:-`` and look at the first 2 lines to
> see how much time was spent warming up the JIT (or attempting to).
>
> Note that we did enable PGO long ago, with modest benefits.  We gave
> up when our JIT compiler became good enough.  Maybe now is the time to
> try again (and also, PGO itself might have improved in the meantime).
>
>> We’d like to get some input on how to contribute our optimization recipe to the PyPy dev tree, perhaps by creating an item to the PyPy issue tracker?
>
> The best would be to create a pull request so that we can look at your
> changes more easily.
>
>> In addition, we would also appreciate any other benchmark or real world use based workload as alternatives to evaluate this.
>
> You can take any Python program that runs either very shortly or not
> faster than CPython.  For a larger example (with Python 2.7):
>
>     cd rpython/jit/tl
>     python ../../bin/rpython -O2 --source targettlr    # 24 secs
>     pypy ../../bin/rpython -O2 --source targettlr        # 39 secs
>
>
> A bientôt,
>
> Armin.
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> https://mail.python.org/mailman/listinfo/pypy-dev