Hi All,

 

This is Vamsi from Server Scripting Languages Optimization team at Intel Corporation.

 

Would like to submit a request to enable the computed goto based dispatch in Python 2.x (which happens to be enabled by default in Python 3 given its performance benefits on a wide range of workloads). We talked about this patch with Guido and he encouraged us to submit a request on Python-dev (email conversation with Guido shown at the bottom of this email).

 

Attached is the computed goto patch (along with instructions to run) for Python 2.7.10 (based on the patch submitted by Jeffrey Yasskin  at http://bugs.python.org/issue4753). We built and tested this patch for Python 2.7.10 on a Linux machine (Ubuntu 14.04 LTS server, Intel Xeon – Haswell EP CPU with 18 cores, hyper-threading off, turbo off).

 

Below is a summary of the performance we saw on the “grand unified python benchmarks” suite (available at https://hg.python.org/benchmarks/). We made 3 rigorous runs of the following benchmarks. In each rigorous run, a benchmark is run 100 times with and without the computed goto patch. Below we show the average performance boost for the 3 rigorous runs.

 

Python 2.7.10 (original) vs Computed Goto performance

Benchmark

Delta (rigorous run #1) %

Delta (rigorous run #2)  %

Delta (rigorous run #3) %

Avg. Delta %

iterative_count

24.48

24.36

23.78

24.2

unpack_sequence

19.06

18.47

19.48

19.0

slowspitfire

14.36

13.41

16.65

14.8

threaded_count

15.85

13.43

13.93

14.4

pystone

10.68

11.67

11.08

11.1

nbody

10.25

8.93

9.28

9.5

go

7.96

8.76

7.69

8.1

pybench

6.3

6.8

7.2

6.8

spectral_norm

5.49

9.37

4.62

6.5

float

6.09

6.2

6.96

6.4

richards

6.19

6.41

6.42

6.3

slowunpickle

6.37

8.78

3.55

6.2

json_dump_v2

1.96

12.53

3.57

6.0

call_simple

6.37

5.91

3.92

5.4

chaos

4.57

5.34

3.85

4.6

call_method_slots

2.63

3.27

7.71

4.5

telco

5.18

1.83

6.47

4.5

simple_logging

3.48

1.57

7.4

4.2

call_method

2.61

5.4

3.88

4.0

chameleon

2.03

6.26

3.2

3.8

fannkuch

3.89

3.19

4.39

3.8

silent_logging

4.33

3.07

3.39

3.6

slowpickle

5.72

-1.12

6.06

3.6

2to3

2.99

3.6

3.45

3.3

etree_iterparse

3.41

2.51

3

3.0

regex_compile

3.44

2.48

2.84

2.9

mako_v2

2.14

1.29

5.22

2.9

meteor_contest

2.01

2.2

3.88

2.7

django

6.68

-1.23

2.56

2.7

formatted_logging

1.97

5.82

-0.11

2.6

hexiom2

2.83

2.1

2.55

2.5

django_v2

1.93

2.53

2.92

2.5

etree_generate

2.38

2.13

2.51

2.3

mako

-0.3

9.66

-3.11

2.1

bzr_startup

0.35

1.97

3

1.8

etree_process

1.84

1.01

1.9

1.6

spambayes

1.76

0.76

0.48

1.0

regex_v8

1.96

-0.66

1.63

1.0

html5lib

0.83

0.72

0.97

0.8

normal_startup

1.41

0.39

0.24

0.7

startup_nosite

1.2

0.41

0.42

0.7

etree_parse

0.24

0.9

0.79

0.6

json_load

1.38

0.56

-0.25

0.6

pidigits

0.45

0.33

0.28

0.4

hg_startup

0.32

2.07

-1.41

0.3

rietveld

0.05

0.91

-0.43

0.2

tornado_http

2.34

-0.92

-1.27

0.1

call_method_unknown

0.72

1.26

-1.85

0.0

raytrace

-0.35

-0.75

0.94

-0.1

regex_effbot

1.97

-1.18

-2.57

-0.6

fastunpickle

-1.65

0.5

-0.88

-0.7

nqueens

-2.24

-1.53

-0.81

-1.5

fastpickle

-0.74

1.98

-6.26

-1.7

 

 

Thanks,

Vamsi

 

------------------------------------------------------------------------------------------------------------------------------------------------------------

From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf Of Guido van Rossum
Sent: Tuesday, May 19, 2015 1:59 PM
To: Cohn, Robert S
Cc: R. David Murray (r.david.murray@murrayandwalker.com)
Subject: Re: meeting at PyCon

 

Hi Robert and David,

I just skimmed that thread. There were a lot of noises about backporting it to 2.7 but the final message on the topic, by Antoine, claimed it was too late for 2.7. However, that was before we had announced the EOL extension of 2.7 till 2020, and perhaps we were also in denial about 3.x uptake vs. 2.x. So I think it's definitively worth bringing this up. I would start with a post on python-dev linking to the source code for your patch, and adding a message to the original tracker issue too (without reopening it though -- just so the people who were on the bug will be pinged about it).

Because of backwards compatibility with previous 2.7.x releases, it's very important that the patch not break anything -- in particular this means you can't add opcodes or change their specification. You will also undoubtedly be asked to test this on a variety of platforms 32-bit and 64-bit that people care about. But I'm sure you're expecting all that. :-)

 

You might also check with Benjamin Peterson, who is the 2.7 release manager. I think he just announced 2.7.10, so it's too late for that, but I assume we'll keep doing 2.7.x releases until 2020.

Good luck,

 

--Guido


PS. I am assuming you are contributing this under a PSF-accepted license, e.g. Apache 2.0, otherwise it's an automatic nogo.

 

On Tue, May 19, 2015 at 9:33 AM, Cohn, Robert S <robert.s.cohn@intel.com> wrote:

Hi Guido,

 

When we met for lunch at pycon, I asked if performance related patches would be ok for python 2.x. My understanding is that you thought it was possible if it did not create a maintainability problem. We have an example now, a 2.7 patch for computed goto based on the implementation in python 3 http://bugs.python.org/issue4753 It increases performance by up to 10% across a wide range of workloads.

 

As I mentioned at lunch, we hired David Murray’s company, and he is guiding intel through the development process for cpython. David and I thought it would be good to run this by you before raising the issue on python-dev. Do you have a specific concern about this patch or a more general concern about performance patches to 2.7? Thanks.

 

Robert

--------