
Hi all, I need some input for the benchmarking infrastructure. I'm nearly at the point where I need to have some place to run it before continuing (i.e. I need to try and use it, not just speculate). Anyway, what I was thinking about, and need input on, is how to get at the interpreters to run the benchmark. When we were talking just benchmarks, and not profiling, my thought was to just use whatever python the machine has, and fetch the pypy from the last buildbot run, but for profiling that will not work (and anyway, running the profiling on the standard python is quite pointless). So benchmarks will obviously have to specify what interpreter(s) they should be run by somehow. The bigger question is how to get those interpreters. Should running the benchmarks also trigger building one (or more) pypy interpreters according to specs in the benchmarking framework? (but then if you only want it to run one benchmark, you may have to wait for all the interpreters to build) Perhaps each benchmark should build its own interpreter (though this seems slow, given that most benchmarks can probably run on an identically built interpreter). Or maybe the installed infrastructure should only care about history, and if you want to run a single benchmark, you do that on your own. Thoughts please! /Anders

Anders Hammarquist wrote:
Hi all,
Hi Iko, hi all [cut]
I think this is a vital requirement. I can imagine various scenarios where you want to specify which interpreters to benchmark, e.g.: 1) benchmarking pypy-cli vs IronPython or pypy-jvm vs Jython 2) benchmarking pypy-cs at different svn revisions 3) benchmarking pypy-c-trunk vs pypy-c-some-branch (maybe with the possibility of specifying pypy-c-trunk-at-the-revision-where-the-branch-was-created, to avoid noise) 4) benchmarking pypy-cs with different build options 5) bencharmking with profiling enabled (I'd say that profiling should be off by default)
Conceptually, I would say that you need to rebuild the required pypys every time you run the benchmarks. Concretely, we can think of putting them into a cache, so that if you need a pypy-c that for some reason has already been built, you just reuse it. Moreover, it could be nice if you could select the pypy to benchmark from a list of already built pypys, if you want to same time. Also, we may need to think how to deal with excessive loading: if everyone of us tries to run his own set of benchmark, the benchmarking machine could become too overloaded to be useful in any sense. ciao, Anto

Hi Anders. I, personally, would start with a more modest goal than fully running infrastructure. I would like to be able to run it myself, provided I have downloaded and compiled all necessary interpreters. So say, I want to run benchmarks a, b, c using python, pypy-a, pypy-b and pypy-c. And I say something like: ./run_benchmarks --benchmarks="a b c" --interpreters="pypy-a pypy-b python pypy-c" And get some sort of results, to start with in a text form. Next step would be to have a backend that stores informations between runs, but I would really really like to go with incremental approach, where I have something to start with and later on improve and add features. PS. Sorry for late reply, was on holiday. Cheers, fijal On Mon, Sep 21, 2009 at 9:27 AM, Anders Hammarquist <iko@openend.se> wrote:

This is not an answer to your question but might interest you. We it comes to benchmarking, the following paper is certainly worth reading: http://www-plan.cs.colorado.edu/diwan/asplos09.pdf It explains how many benchmarks are biased and result in faulty measures (in important proportions), and how this can be compensated. Cheers, Franck

Anders Hammarquist wrote:
I wonder, would it be possible to add Cython to the benchmark loop? I would love to see it compared to PyPy, simply because both projects aim to compile Python code to C code (amongst other things, obviously). I know that Cython can't currently compete with PyPy in terms of feature completeness - it clearly lacks some very important features of the Python language, so it won't be able to run all benchmarks for a while, and the comparison would easily show where the black spots are that need fixing. Just out of curiosity (and to wet your appetite :), I ran PyPy's richards benchmark unmodified in Cython (latest cython-unstable) and got this: python2.6 -c 'import richards; richards.main()' Richards benchmark (Python) starting... [...] finished. Total time for 10 iterations: 3.98 secs Average time per iteration: 398.44 ms compared to CPython 2.6.2: python2.6 -c 'import richards; richards.main()' Richards benchmark (Python) starting... [...] finished. Total time for 10 iterations: 4.86 secs Average time per iteration: 485.97 ms That's almost 20% faster IMO and not bad at all, given that Cython's main performance feature (C typing) wasn't used. When I use an external .pxd file (attached) to redeclare the classes as extension types and to add a C nature to their methods (still without any benchmark code modifications), I get this: python2.6 -c 'import richards; richards.main(iterations=10)' Richards benchmark (Python) starting... [...] finished. Total time for 10 iterations: 0.99 secs Average time per iteration: 99.28 ms That's almost a factor of five compared to CPython. If possible, I would like to add both a normal Cython compiler run and a pxd enabled run to the benchmark comparison with PyPy and CPython. Any chance this could be integrated? I'm asking now, because I imagine that the benchmarking framework will have to integrate the Cython compiler somehow, maybe using distutils or on-the-fly compilation with pyximport. Stefan cimport cython cdef class Packet: cdef public object link cdef public object ident cdef public object kind cdef public object datum cdef public object data cpdef append_to(self,lst) cdef class TaskRec: pass cdef class DeviceTaskRec(TaskRec): cdef public object pending cdef class IdleTaskRec(TaskRec): cdef public object control cdef public Py_ssize_t count cdef class HandlerTaskRec(TaskRec): cdef public object work_in # = None cdef public object device_in # = None cpdef workInAdd(self,p) cpdef deviceInAdd(self,p) cdef class WorkerTaskRec(TaskRec): cdef public object destination # = I_HANDLERA cdef public Py_ssize_t count cdef class TaskState: cdef public bool packet_pending # = True cdef public bool task_waiting # = False cdef public bool task_holding # = False cpdef packetPending(self) cpdef waiting(self) cpdef running(self) cpdef waitingWithPacket(self) cpdef isPacketPending(self) cpdef isTaskWaiting(self) cpdef isTaskHolding(self) cpdef isTaskHoldingOrWaiting(self) cpdef isWaitingWithPacket(self) cdef class TaskWorkArea: cdef public list taskTab # = [None] * TASKTABSIZE cdef public object taskList # = None cdef public Py_ssize_t holdCount # = 0 cdef public Py_ssize_t qpktCount # = 0 cdef class Task(TaskState): cdef public Task link # = taskWorkArea.taskList cdef public object ident # = i cdef public object priority # = p cdef public object input # = w cdef public object handle # = r cpdef addPacket(self,Packet p,old) cpdef runTask(self) cpdef waitTask(self) cpdef hold(self) cpdef release(self,i) cpdef qpkt(self,Packet pkt) cpdef findtcb(self,id) cdef class DeviceTask(Task): @cython.locals(d=DeviceTaskRec) cpdef fn(self,pkt,r) cdef class HandlerTask(Task): @cython.locals(h=HandlerTaskRec) cpdef fn(self,pkt,r) cdef class IdleTask(Task): @cython.locals(i=IdleTaskRec) cpdef fn(self,pkt,r) cdef class WorkTask(Task): @cython.locals(w=WorkerTaskRec) cpdef fn(self,pkt,r) @cython.locals(t=Task) cpdef schedule() cdef class Richards: cpdef run(self, iterations)

On Oct 2, 2009, at 5:20 PM, Stefan Behnel wrote:
I think you did some really interesting experiments but the projects don't aim at the same thing at all. PyPy python interpreter is not compiling python code to C it is just interpreting it and using a jit to dynamically compile code (and this is directly to machine code). PyPy python interpreter is meant to be a fully compatible python interpreter and it doesn't depend on no CPython code (like cython does). I only think it is interesting to compare pypy to other python interpreters, anything that fully suports the python language. I do find cython cool, but as it doesn't try to be a python interpreter there would be no point in doing that. What I would like to see is comparisons against python 2.5-7, 3.1, unladen swallow, psyco 2.0 and ironpython. -- Leonardo Santagada santagada at gmail.com

On Fri, Oct 02, 2009 at 18:49 -0300, Leonardo Santagada wrote:
I am sure Stefan is fully aware of what PyPy is.
sure, and Jython. I don't see harm in adding Cython for the benchmarks it understands and if it's all easy enough. But let's first get started to automatically benchmark pypy trunk/selected branches over the revisions and compare it to one or multiple CPython versions. best, holger

Anders Hammarquist wrote:
Hi all,
Hi Iko, hi all [cut]
I think this is a vital requirement. I can imagine various scenarios where you want to specify which interpreters to benchmark, e.g.: 1) benchmarking pypy-cli vs IronPython or pypy-jvm vs Jython 2) benchmarking pypy-cs at different svn revisions 3) benchmarking pypy-c-trunk vs pypy-c-some-branch (maybe with the possibility of specifying pypy-c-trunk-at-the-revision-where-the-branch-was-created, to avoid noise) 4) benchmarking pypy-cs with different build options 5) bencharmking with profiling enabled (I'd say that profiling should be off by default)
Conceptually, I would say that you need to rebuild the required pypys every time you run the benchmarks. Concretely, we can think of putting them into a cache, so that if you need a pypy-c that for some reason has already been built, you just reuse it. Moreover, it could be nice if you could select the pypy to benchmark from a list of already built pypys, if you want to same time. Also, we may need to think how to deal with excessive loading: if everyone of us tries to run his own set of benchmark, the benchmarking machine could become too overloaded to be useful in any sense. ciao, Anto

Hi Anders. I, personally, would start with a more modest goal than fully running infrastructure. I would like to be able to run it myself, provided I have downloaded and compiled all necessary interpreters. So say, I want to run benchmarks a, b, c using python, pypy-a, pypy-b and pypy-c. And I say something like: ./run_benchmarks --benchmarks="a b c" --interpreters="pypy-a pypy-b python pypy-c" And get some sort of results, to start with in a text form. Next step would be to have a backend that stores informations between runs, but I would really really like to go with incremental approach, where I have something to start with and later on improve and add features. PS. Sorry for late reply, was on holiday. Cheers, fijal On Mon, Sep 21, 2009 at 9:27 AM, Anders Hammarquist <iko@openend.se> wrote:

This is not an answer to your question but might interest you. We it comes to benchmarking, the following paper is certainly worth reading: http://www-plan.cs.colorado.edu/diwan/asplos09.pdf It explains how many benchmarks are biased and result in faulty measures (in important proportions), and how this can be compensated. Cheers, Franck

Franck Pommereau wrote:
Seconded. This is an excellent paper. Btw, issues like this might be a reason to really reuse parts of the unladen-swallow benchmark runner, since they seem to have put work into doing the right thing from a statistics point of view. Cheers, Carl Friedrich

Anders Hammarquist wrote:
I wonder, would it be possible to add Cython to the benchmark loop? I would love to see it compared to PyPy, simply because both projects aim to compile Python code to C code (amongst other things, obviously). I know that Cython can't currently compete with PyPy in terms of feature completeness - it clearly lacks some very important features of the Python language, so it won't be able to run all benchmarks for a while, and the comparison would easily show where the black spots are that need fixing. Just out of curiosity (and to wet your appetite :), I ran PyPy's richards benchmark unmodified in Cython (latest cython-unstable) and got this: python2.6 -c 'import richards; richards.main()' Richards benchmark (Python) starting... [...] finished. Total time for 10 iterations: 3.98 secs Average time per iteration: 398.44 ms compared to CPython 2.6.2: python2.6 -c 'import richards; richards.main()' Richards benchmark (Python) starting... [...] finished. Total time for 10 iterations: 4.86 secs Average time per iteration: 485.97 ms That's almost 20% faster IMO and not bad at all, given that Cython's main performance feature (C typing) wasn't used. When I use an external .pxd file (attached) to redeclare the classes as extension types and to add a C nature to their methods (still without any benchmark code modifications), I get this: python2.6 -c 'import richards; richards.main(iterations=10)' Richards benchmark (Python) starting... [...] finished. Total time for 10 iterations: 0.99 secs Average time per iteration: 99.28 ms That's almost a factor of five compared to CPython. If possible, I would like to add both a normal Cython compiler run and a pxd enabled run to the benchmark comparison with PyPy and CPython. Any chance this could be integrated? I'm asking now, because I imagine that the benchmarking framework will have to integrate the Cython compiler somehow, maybe using distutils or on-the-fly compilation with pyximport. Stefan cimport cython cdef class Packet: cdef public object link cdef public object ident cdef public object kind cdef public object datum cdef public object data cpdef append_to(self,lst) cdef class TaskRec: pass cdef class DeviceTaskRec(TaskRec): cdef public object pending cdef class IdleTaskRec(TaskRec): cdef public object control cdef public Py_ssize_t count cdef class HandlerTaskRec(TaskRec): cdef public object work_in # = None cdef public object device_in # = None cpdef workInAdd(self,p) cpdef deviceInAdd(self,p) cdef class WorkerTaskRec(TaskRec): cdef public object destination # = I_HANDLERA cdef public Py_ssize_t count cdef class TaskState: cdef public bool packet_pending # = True cdef public bool task_waiting # = False cdef public bool task_holding # = False cpdef packetPending(self) cpdef waiting(self) cpdef running(self) cpdef waitingWithPacket(self) cpdef isPacketPending(self) cpdef isTaskWaiting(self) cpdef isTaskHolding(self) cpdef isTaskHoldingOrWaiting(self) cpdef isWaitingWithPacket(self) cdef class TaskWorkArea: cdef public list taskTab # = [None] * TASKTABSIZE cdef public object taskList # = None cdef public Py_ssize_t holdCount # = 0 cdef public Py_ssize_t qpktCount # = 0 cdef class Task(TaskState): cdef public Task link # = taskWorkArea.taskList cdef public object ident # = i cdef public object priority # = p cdef public object input # = w cdef public object handle # = r cpdef addPacket(self,Packet p,old) cpdef runTask(self) cpdef waitTask(self) cpdef hold(self) cpdef release(self,i) cpdef qpkt(self,Packet pkt) cpdef findtcb(self,id) cdef class DeviceTask(Task): @cython.locals(d=DeviceTaskRec) cpdef fn(self,pkt,r) cdef class HandlerTask(Task): @cython.locals(h=HandlerTaskRec) cpdef fn(self,pkt,r) cdef class IdleTask(Task): @cython.locals(i=IdleTaskRec) cpdef fn(self,pkt,r) cdef class WorkTask(Task): @cython.locals(w=WorkerTaskRec) cpdef fn(self,pkt,r) @cython.locals(t=Task) cpdef schedule() cdef class Richards: cpdef run(self, iterations)

On Oct 2, 2009, at 5:20 PM, Stefan Behnel wrote:
I think you did some really interesting experiments but the projects don't aim at the same thing at all. PyPy python interpreter is not compiling python code to C it is just interpreting it and using a jit to dynamically compile code (and this is directly to machine code). PyPy python interpreter is meant to be a fully compatible python interpreter and it doesn't depend on no CPython code (like cython does). I only think it is interesting to compare pypy to other python interpreters, anything that fully suports the python language. I do find cython cool, but as it doesn't try to be a python interpreter there would be no point in doing that. What I would like to see is comparisons against python 2.5-7, 3.1, unladen swallow, psyco 2.0 and ironpython. -- Leonardo Santagada santagada at gmail.com

On Fri, Oct 02, 2009 at 18:49 -0300, Leonardo Santagada wrote:
I am sure Stefan is fully aware of what PyPy is.
sure, and Jython. I don't see harm in adding Cython for the benchmarks it understands and if it's all easy enough. But let's first get started to automatically benchmark pypy trunk/selected branches over the revisions and compare it to one or multiple CPython versions. best, holger
participants (8)
-
Anders Hammarquist
-
Antonio Cuni
-
Carl Friedrich
-
fijall
-
Franck Pommereau
-
holger krekel
-
Leonardo Santagada
-
Stefan Behnel