PyPy at last infinitely fast

Hi, speed.pypy.org currently shows a very encouraging performance picture for PyPy: it is "infinite times faster than CPython". No, it is note yet April 1st. Codespeed creates the front page plots using the latest tested revision... which currently has no data for pypy-c-jit (32 bits). There is already a ticket for that issue, as well as to fix the Changes view letting you choose a revision with no data. The assumption was that a revision would be tested for all executables of a given project, which is no longer the case for PyPy. Still, even though there will be a fix, it may be a good idea to test all exes for the same revision to easy comparisons. Cheers, Miquel

On 18/07/11 21:37, Miquel Torres wrote:
hooray! We finally finished pypy :-)
this is not completely easy, because buildbot just pulls and updates to the latest revision. If someone pushes in between the two runs, the revision is different. A quick workaround would be to force buildbot to update to a more specific revision. E.g., "the highest revision of today at 00:00", or something like this. This should ensure to have the same revision for all our benchmarks/tests. ciao, Anto

On 18/07/11 22:36, Miquel Torres wrote:
wow, all of this is very cool, thank you! However, there was a problem with the uploading of the results tonight :-/ http://buildbot.pypy.org/builders/jit-benchmark-linux-x86-32/builds/791/step... Do you know what it could be? ciao, Anto

I wanted to save you the time to look it up and change it but well, I couldn't. Regarding the broken changes table, it is obviously a related problem, together with not making the user having to browse through empty data. All of those issues are caused by only benchmarking some of the executables with one revision, and other with different revisions. Each can be more or less solved with a couple of lines of code, but it would introduce quite a bit of overhead, so we need to consider (test/benchmark) the possible solution carefully. Could you imagine implementing Antonio's suggestion?: Citing:
Sounds good to me. Miquel 2011/7/20 Maciej Fijalkowski <fijall@gmail.com>:

Hi Miquel, Maciek, all, On 20/07/11 22:01, Miquel Torres wrote:
I think that there was another issue: currently we pass a revision number like 12345:aabbccddee, but codespeed complained that it's not a valid hg revision (it was checking that it's exactly 40 chars long). I think that fijal fixed it, not sure how.
I think that codespeed should fix the behavior soon or later, the current one looks broken to me. However, I'm fine for having just a workaround at the moment.
unfortunately, it's not as simple. In mercurial there is no easy way to update to a specific date/time ("hg up --date" does not consider branches, so you might end up in a different branch than default). Moreover, we want to be able to manually kick a benchmark run just for e.g. 32 bit but not on 64, so the workaround would not work in this case. I propose a new workaround: instead of having pypy-c and pypy-c-64 both in the "tannit" environment, what about having tannit-32 and tannit-64? I think this would fix the issues, at the cost of not being able to have both 32 and 64 bit plots in the same graph. What do you think? ciao, Anto

On Thu, Jul 21, 2011 at 9:09 AM, Antonio Cuni <anto.cuni@gmail.com> wrote:
by commenting out a check. Exact length is nonsense since we might pass 5 digits one day....
I think having them at the same graph is more important than having changes showing correct things. I might give it a go if nobody wants to

On 21/07/11 09:13, Maciej Fijalkowski wrote:
Not sure. Having them in the same graph is important only to quickly spot cases in which one backend is much slower than the other, which seems not to be the case. On the other hand, to spot regressions it's enough to have them in two separate graphs. ciao, Anto

Hi Miquel, On 25/07/11 21:09, Miquel Torres wrote:
are you sure that having two separate environments will fix the "changes" page? I tried to add "tannit-64" (no results yet, a build is running right now), but in the dropdown menu of the Changes page I can see all the revisions that are also in "tannit". I'd have expected those two to be completely separate. ciao, Anto

Hi Antonio, I'm afraid you are right, the solution I proposed makes no sense. Sorry I gave you a wrong answer. A revision is unique for a project (well, now to a branch of a project), and thus they are not separated by environment. Codespeed was not really designed with revisions in mind that sometimes have results, and sometimes not. To solve that, revisions would need to depend on an executable as well, or introduce a check that so that the revision list is tailored to a particular exe, but it would be ugly. There is a way though to "solve" it right now. Separate executables into two different projects. pypy32, pypy64, instead of different environements. The revision list shown does change on-the-fly depending on the project the selected exe belongs to. Cheers, Miquel 2011/7/27 Antonio Cuni <anto.cuni@gmail.com>:

Hi Miquel, On 28/07/11 11:04, Miquel Torres wrote:
I can think of a semi-ugly workaround, I don't know how close it is to a working solution. If I understand correctly, the numbers which are displayed in the "changes" page are precomputed and saved into "Reports" by create_report_if_enough_data. I can see that the function finds the last_results to compare with simply by doing last_revs[1]. What about changing this logic to pick the latest rev which actually contains at least one result for the current environment?
uhm, I don't see any option to select a different project from within speed.pypy.org. Is it simply because there is only one? Or having another project would mean to visit a complete different webpage? Moreover, I wonder how this problem relates to the upcoming speed.python.org: will every interpreter (cpython, pypy, jython, etc.) be a separate project? Will it be possible to compare results of different interpreters? ciao, Anto

Hi Antonio, I didn't answer before because I was on holidays.
That would be a quick workaround, yes, but we would quickly get into trouble. How would you then know which revision to use to compute the trend? A naive approach would be: trend_rev = last_rev_with_data - 10 * number_of_revs_without_data (in the current case 1 for 32 bit, 1 for 64 bit, which means 1) Another would be trend_rev = last_rev_with_data - 10 - number_of_revs_until_data_is_found Those two revisions may or may not be the same. So if there is no guarantee that every revision saved has data for every executable in the project, there will be a random number of executable groups (grouped by revision that is benchmarked). It *is* solvable, but it can quickly become a mess. And we may want to show other data or statistics in the future other than the trend. That is the reason I suggest using different projects when different executables will be using different revisions to do the benchmarking. I can be proven wrong, however, and the workaround can be implemented for changes, trend, and rev list without a performance penalty.
First, speed.pypy.org only has a single project: PyPy (cpython exes are under a CPython project, but they are not shown in the changes or timeline view). Second, even with several projects you will currently only see a list of executables. Improving that is a theme for the GSoC I will be able to work on this next week, so I maybe we can meet on IRC and have a go at it? (and close this huge email thread ;-) 2011/7/28 Antonio Cuni <anto.cuni@gmail.com>:

On 18/07/11 21:37, Miquel Torres wrote:
hooray! We finally finished pypy :-)
this is not completely easy, because buildbot just pulls and updates to the latest revision. If someone pushes in between the two runs, the revision is different. A quick workaround would be to force buildbot to update to a more specific revision. E.g., "the highest revision of today at 00:00", or something like this. This should ensure to have the same revision for all our benchmarks/tests. ciao, Anto

On 18/07/11 22:36, Miquel Torres wrote:
wow, all of this is very cool, thank you! However, there was a problem with the uploading of the results tonight :-/ http://buildbot.pypy.org/builders/jit-benchmark-linux-x86-32/builds/791/step... Do you know what it could be? ciao, Anto

I wanted to save you the time to look it up and change it but well, I couldn't. Regarding the broken changes table, it is obviously a related problem, together with not making the user having to browse through empty data. All of those issues are caused by only benchmarking some of the executables with one revision, and other with different revisions. Each can be more or less solved with a couple of lines of code, but it would introduce quite a bit of overhead, so we need to consider (test/benchmark) the possible solution carefully. Could you imagine implementing Antonio's suggestion?: Citing:
Sounds good to me. Miquel 2011/7/20 Maciej Fijalkowski <fijall@gmail.com>:

Hi Miquel, Maciek, all, On 20/07/11 22:01, Miquel Torres wrote:
I think that there was another issue: currently we pass a revision number like 12345:aabbccddee, but codespeed complained that it's not a valid hg revision (it was checking that it's exactly 40 chars long). I think that fijal fixed it, not sure how.
I think that codespeed should fix the behavior soon or later, the current one looks broken to me. However, I'm fine for having just a workaround at the moment.
unfortunately, it's not as simple. In mercurial there is no easy way to update to a specific date/time ("hg up --date" does not consider branches, so you might end up in a different branch than default). Moreover, we want to be able to manually kick a benchmark run just for e.g. 32 bit but not on 64, so the workaround would not work in this case. I propose a new workaround: instead of having pypy-c and pypy-c-64 both in the "tannit" environment, what about having tannit-32 and tannit-64? I think this would fix the issues, at the cost of not being able to have both 32 and 64 bit plots in the same graph. What do you think? ciao, Anto

On Thu, Jul 21, 2011 at 9:09 AM, Antonio Cuni <anto.cuni@gmail.com> wrote:
by commenting out a check. Exact length is nonsense since we might pass 5 digits one day....
I think having them at the same graph is more important than having changes showing correct things. I might give it a go if nobody wants to

On 21/07/11 09:13, Maciej Fijalkowski wrote:
Not sure. Having them in the same graph is important only to quickly spot cases in which one backend is much slower than the other, which seems not to be the case. On the other hand, to spot regressions it's enough to have them in two separate graphs. ciao, Anto

Hi Miquel, On 25/07/11 21:09, Miquel Torres wrote:
are you sure that having two separate environments will fix the "changes" page? I tried to add "tannit-64" (no results yet, a build is running right now), but in the dropdown menu of the Changes page I can see all the revisions that are also in "tannit". I'd have expected those two to be completely separate. ciao, Anto

Hi Antonio, I'm afraid you are right, the solution I proposed makes no sense. Sorry I gave you a wrong answer. A revision is unique for a project (well, now to a branch of a project), and thus they are not separated by environment. Codespeed was not really designed with revisions in mind that sometimes have results, and sometimes not. To solve that, revisions would need to depend on an executable as well, or introduce a check that so that the revision list is tailored to a particular exe, but it would be ugly. There is a way though to "solve" it right now. Separate executables into two different projects. pypy32, pypy64, instead of different environements. The revision list shown does change on-the-fly depending on the project the selected exe belongs to. Cheers, Miquel 2011/7/27 Antonio Cuni <anto.cuni@gmail.com>:

Hi Miquel, On 28/07/11 11:04, Miquel Torres wrote:
I can think of a semi-ugly workaround, I don't know how close it is to a working solution. If I understand correctly, the numbers which are displayed in the "changes" page are precomputed and saved into "Reports" by create_report_if_enough_data. I can see that the function finds the last_results to compare with simply by doing last_revs[1]. What about changing this logic to pick the latest rev which actually contains at least one result for the current environment?
uhm, I don't see any option to select a different project from within speed.pypy.org. Is it simply because there is only one? Or having another project would mean to visit a complete different webpage? Moreover, I wonder how this problem relates to the upcoming speed.python.org: will every interpreter (cpython, pypy, jython, etc.) be a separate project? Will it be possible to compare results of different interpreters? ciao, Anto

Hi Antonio, I didn't answer before because I was on holidays.
That would be a quick workaround, yes, but we would quickly get into trouble. How would you then know which revision to use to compute the trend? A naive approach would be: trend_rev = last_rev_with_data - 10 * number_of_revs_without_data (in the current case 1 for 32 bit, 1 for 64 bit, which means 1) Another would be trend_rev = last_rev_with_data - 10 - number_of_revs_until_data_is_found Those two revisions may or may not be the same. So if there is no guarantee that every revision saved has data for every executable in the project, there will be a random number of executable groups (grouped by revision that is benchmarked). It *is* solvable, but it can quickly become a mess. And we may want to show other data or statistics in the future other than the trend. That is the reason I suggest using different projects when different executables will be using different revisions to do the benchmarking. I can be proven wrong, however, and the workaround can be implemented for changes, trend, and rev list without a performance penalty.
First, speed.pypy.org only has a single project: PyPy (cpython exes are under a CPython project, but they are not shown in the changes or timeline view). Second, even with several projects you will currently only see a list of executables. Improving that is a theme for the GSoC I will be able to work on this next week, so I maybe we can meet on IRC and have a go at it? (and close this huge email thread ;-) 2011/7/28 Antonio Cuni <anto.cuni@gmail.com>:
participants (3)
-
Antonio Cuni
-
Maciej Fijalkowski
-
Miquel Torres