[Python-Dev] Computed Goto dispatch for Python 2
Parasa, Srinivas Vamsi
srinivas.vamsi.parasa at intel.com
Thu May 28 14:37:09 CEST 2015
Hi Matthias and Gregory,
The results shown were run on Python 2.7.10 built using gcc. The goal of our team is to make long-term open source contributions with emphasis on performance optimization and support for the larger community and hence icc wasn't used.
We've experimented with gcc profile-guided optimization (PGO) and LTO a month ago. PGO being an independent/orthogonal optimization, it shows improvement for both the stock version (i.e. current switch based dispatch) and the computed-goto version. We ran PGO optimized Python on the workloads available at language benchmarks game (http://benchmarksgame.alioth.debian.org/u64/python.php) and found that PGO benefits computed-goto version more than the stock version. I haven't run PGO optimized Python with the "grand unified python benchmarks" (GUPB) suite ...please give me a day or two and will get back to you with PGO (and LTO) numbers as well. (LTO hasn't shown much benefit so far on the language benchmarks game workloads).
Also, in our analysis using CPU performance counters, we found that python workloads (in general) have higher CPU front-end issues (mainly I-cache misses) and PGO is very helpful in mitigating those issues. We're also investigating and working on ways to further reduce those front-end issues and speedup Python workloads.
From: Matthias Klose [mailto:doko at ubuntu.com]
Sent: Thursday, May 28, 2015 5:01 AM
To: Parasa, Srinivas Vamsi; 'python-dev at python.org'
Subject: Re: [Python-Dev] Computed Goto dispatch for Python 2
On 05/28/2015 02:17 AM, Parasa, Srinivas Vamsi wrote:
> Hi All,
> This is Vamsi from Server Scripting Languages Optimization team at Intel Corporation.
> Would like to submit a request to enable the computed goto based dispatch in Python 2.x (which happens to be enabled by default in Python 3 given its performance benefits on a wide range of workloads). We talked about this patch with Guido and he encouraged us to submit a request on Python-dev (email conversation with Guido shown at the bottom of this email).
> Attached is the computed goto patch (along with instructions to run) for Python 2.7.10 (based on the patch submitted by Jeffrey Yasskin at http://bugs.python.org/issue4753). We built and tested this patch for Python 2.7.10 on a Linux machine (Ubuntu 14.04 LTS server, Intel Xeon - Haswell EP CPU with 18 cores, hyper-threading off, turbo off).
> Below is a summary of the performance we saw on the "grand unified python benchmarks" suite (available at https://hg.python.org/benchmarks/). We made 3 rigorous runs of the following benchmarks. In each rigorous run, a benchmark is run 100 times with and without the computed goto patch. Below we show the average performance boost for the 3 rigorous runs.
> Python 2.7.10 (original) vs Computed Goto performance Benchmark
As Gregory pointed out, there are other options to build the interpreter, and we are missing data how these compare with your patch.
I assume, you tested with the Intel compiler, so it would be good to see results for other compilers as well (GCC, clang). Please could you provide the data for LTO and profile guided optimized builds (maybe combined too)? I'm happy to work with you on setting up these builds, but currently don't have the machine resources to do so myself.
If the benefits show up for these configurations too, then I'm +/-0 on this patch.
More information about the Python-Dev