[Python-Dev] Computed Goto dispatch for Python 2

Thu May 28 21:28:50 CEST 2015

Sorry for missing Julian's question. The GCC version used for the benchmarks is 4.8.2
Will look into the discussion at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39284 and will investigate it.

> Julian Taylor jtaylor.debian at googlemail.com 
> Thu May 28 13:30:59 CEST 2015
> won't this need python compiled with gcc 5.1 to have any effect? Which
> compiler version was used for the benchmark?
> the issue that negated most computed goto improvements
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39284) was only closed
> very recently (r212172, 9f4ec746affbde1)

-----Original Message-----
From: Matthias Klose [mailto:doko at ubuntu.com] 
Sent: Thursday, May 28, 2015 5:01 AM
To: Parasa, Srinivas Vamsi; 'python-dev at python.org'
Subject: Re: [Python-Dev] Computed Goto dispatch for Python 2

On 05/28/2015 02:17 AM, Parasa, Srinivas Vamsi wrote:
> Hi All,
> 
> This is Vamsi from Server Scripting Languages Optimization team at Intel Corporation.
> 
> Would like to submit a request to enable the computed goto based dispatch in Python 2.x (which happens to be enabled by default in Python 3 given its performance benefits on a wide range of workloads). We talked about this patch with Guido and he encouraged us to submit a request on Python-dev (email conversation with Guido shown at the bottom of this email).
> 
> Attached is the computed goto patch (along with instructions to run) for Python 2.7.10 (based on the patch submitted by Jeffrey Yasskin  at http://bugs.python.org/issue4753). We built and tested this patch for Python 2.7.10 on a Linux machine (Ubuntu 14.04 LTS server, Intel Xeon - Haswell EP CPU with 18 cores, hyper-threading off, turbo off).
> 
> Below is a summary of the performance we saw on the "grand unified python benchmarks" suite (available at https://hg.python.org/benchmarks/). We made 3 rigorous runs of the following benchmarks. In each rigorous run, a benchmark is run 100 times with and without the computed goto patch. Below we show the average performance boost for the 3 rigorous runs.
> 
> Python 2.7.10 (original) vs Computed Goto performance Benchmark

-1

As Gregory pointed out, there are other options to build the interpreter, and we are missing data how these compare with your patch.

I assume, you tested with the Intel compiler, so it would be good to see results for other compilers as well (GCC, clang).  Please could you provide the data for LTO and profile guided optimized builds (maybe combined too)?  I'm happy to work with you on setting up these builds, but currently don't have the machine resources to do so myself.

If the benefits show up for these configurations too, then I'm +/-0 on this patch.

Matthias