[pypy-dev] [GSoC] Developing a benchmark suite (for Python 3.x)

Fri Apr 8 20:21:12 CEST 2011

I talked to Fijal about my project last night, the result is that
basically the project as is, is not that interesting because the means
to execute the benchmarks on multiple interpreters are currently
missing.

Another point we talked about was that porting the benchmarks would
not be very useful as the interesting ones all have dependencies which
have not (yet) been ported to Python 3.x.

The first point, execution on multiple interpreters, has to be solved
or this project is pretty much pointless, therefore I've changed my
proposal to include just that. However the proposal still includes
porting the benchmarks although this is planned to happen after the
development of an application able to run the benchmarks on multiple
interpreters.

The reason for this is that even though the portable benchmarks might
not prove to be that interesting the basic stuff for porting using
2to3 would be there, making it easier to port benchmarks in the
future, as the dependencies become available under Python 3.x. However
I plan to do that after implementing the prior mentioned application
putting the application at higher priority.

This way, should I not be able to complete all my goals, it is
unlikely that anything but the porting will suffer and the project
would still produce useful results during the GSoC.

Anyway here is the current, updated, proposal:

Abstract
=======

As of now there are several benchmark suites used by Python
implementations, PyPy uses the benchmarks[1] developed for the Unladen
Swallow[2] project as well as several other benchmarks they
implemented on their own, CPython[3] uses the Unladen Swallow
benchmarks and several "crap benchmarks used for historical
reasons"[4].

This makes comparisons unnecessarily hard and causes confusion. As a
solution to this problem I propose merging the existing benchmarks -
at least those considered worth having - into a single benchmark suite
which can be shared by all implementations and ported to Python 3.x.

Another problem reported by Maciej Fijalkowski is that currenly the
way benchmarks are executed by PyPy is more or less a hack. Work will
have to be done to allow execution of the benchmarks on different
interpreters and their most recent versions (from their respective
repositories). The application for this should also be able to upload
the results to a codespeed instance such as http://speed.pypy.org.

Milestones
=========
The project can be divided into several milestones:

1. Definition of the benchmark suite. This will entail contacting
developers of Python implementations (CPython, PyPy, IronPython and
Jython), via discussion on the appropriate mailing lists. This might
be achievable as part of this proposal.
2. Merging the benchmarks. Based on the prior agreed upon definition,
the benchmarks will be merged into a single suite.
3. Implementing a system to run the benchmarks. In order to execute
the benchmarks it will be necessary to have a configurable application
which downloads the interpreters from their repositories, builds them
and executes the benchmarks with them.
4. Porting the suite to Python 3.x. The suite will be ported to 3.x
using 2to3[5], as far as possible. The usage of 2to3 will make it
easier make changes to the repository especially for those still
focusing on 2.x. It is to be expected that some benchmarks cannot be
ported due to dependencies which are not available on Python 3.x.
Those will be ignored by this project to be ported at a later time,
when the necessary requirements are met.

Start of Program (May 24)
======================

Before the coding, milestones 2 and 3, can begin it is necessary to
agree upon a set of benchmarks, everyone is happy with, as described.

Midterm Evaluation (July 12)
=======================

During the midterm I want to merge the benchmarks and implement a way
to execute them.

Final Evaluation (Aug 16)
=====================

In this period the benchmark suite will be ported. If everything works
out perfectly I will even have some time left, if there are problems I
have a buffer here.

Implementation of the Benchmark Runner
==================================

In order to run the benchmarks I propose a simple application which
can be configured to download multiple interpreters, to build them and
execute the benchmarks. The configuration could be similar to tox[6],
downloads of the interpreters could be handled using anyvc[7].

For a site such as http://speed.pypy.org a cronjob, buildbot or
whatelse is preferred, could be setup which executes the application
regularly.

Repository Handling
================

The code for the project will be developed in a Mercurial[8]
repository hosted on Bitbucket[9], both PyPy and CPython use Mercurial
and most people in the Python community should be able to use it.

Probably Asked Questions
======================

Why not use one of the existing benchmark suites for porting?

The effort will be wasted if there is no good base to build upon,
creating a new benchmark suite based upon the existing ones ensures
that.

Why not use Git/Bazaar/...?

Mercurial is used by CPython, PyPy and is fairly well known and used
in the Python community. This ensures easy accessibility for everyone.

What will happen with the Repository after GSoC/How will access to the
repository be handled?

I propose to give administrative rights to one or two representatives
of each project. Those will provide other developers with write
access.

Communication
=============

Communication of the progress will be done via Twitter[10] and my
blog[11], if desired I can also send an email with the contents of the
blog post to the mailing lists of the implementations. Furthermore I
am usually quick to answer via IRC(DasIch on freenode), Twitter or
E-Mail(dasdasich at gmail.com) if anyone has any questions.

Contact to the mentor can be established via the means mentioned above
or via Skype.

About Me
========
My name is Daniel Neuhäuser, I am 19 years old and currently a student
at the Bergstadt-Gymnasium Lüdenscheid[12]. I started programming
(with Python) about 4 years ago and became a member of the Pocoo
Team[13] after successfully participating in the Google Summer of Code
last year, during which I ported Sphinx[14] to Python 3.x and
implemented an algorithm to diff abstract syntax trees to preserve
comments and translated strings which has been used by the other GSoC
projects targeting Sphinx.

.. [1]: https://bitbucket.org/pypy/benchmarks/src
.. [2]: http://code.google.com/p/unladen-swallow/
.. [3]: http://hg.python.org/benchmarks/file/tip/performance
.. [4]: http://hg.python.org/benchmarks/file/62e754c57a7f/performance/README
.. [5]: http://docs.python.org/library/2to3.html
.. [6]: http://codespeak.net/tox/
.. [7]: http://anyvc.readthedocs.org/en/latest/?redir
.. [8]: http://mercurial.selenic.com/
.. [9]: https://bitbucket.org/
.. [10]: http://twitter.com/#!/DasIch
.. [11]: http://dasdasich.blogspot.com/
.. [12]: http://bergstadt-gymnasium.de/
.. [13]: http://www.pocoo.org/team/#daniel-neuhauser
.. [14]: http://sphinx.pocoo.org/