Benchmark updates
Hi everyone, I did some work on the benchmark running & publishing yesterday. The results are now hosted at http://pandas.pydata.org/speed/, so pandas' are at http://pandas.pydata.org/speed/pandas. An RSS feed of regressions is available at http://pandas.pydata.org/speed/pandas/regressions.xml. I plan to track that and manually open issues if they seem legitimate. The runs are now triggered and monitored by Apache Airflow (instead of the cron job I had setup). This gives us a nice dashboard with the ability to view logs and see when benchmarks fail (and eventually email alerts). The dashboard isn't exposed publicly. If you have SSH access to the benchmark server, you can create a tunnel to port 8080 ssh -L 8080:localhost:8080 pandas@panda.likescandy.com Tom
hey Tom, This is really great. Any chance we can create a wiki or README about configuration in the event that the Airflow config needs to be recreated or changed? Thanks again for setting this up, and keeping my coat closet (where the machine is located) toasty. As one minor thing with the benchmarking, I've noticed that default Linux configs can be a little bit aggressive about throttling the CPU frequency. This can be edited in the cpufrequtils script, but at least on my laptop and desktop (Ubuntu 14.04) I find myself having to run "/etc/init.d/cpufrequtils restart" to get it to disable frequency scaling. This should probably happen at boot time, but I'm not sure yet how to do it. So we might want to document this so that we are getting the best quality performance data out of the machine. - Wes On Sun, Aug 20, 2017 at 8:21 AM, Tom Augspurger <tom.augspurger88@gmail.com> wrote:
Hi everyone,
I did some work on the benchmark running & publishing yesterday. The results are now hosted at http://pandas.pydata.org/speed/, so pandas' are at http://pandas.pydata.org/speed/pandas. An RSS feed of regressions is available at http://pandas.pydata.org/speed/pandas/regressions.xml. I plan to track that and manually open issues if they seem legitimate.
The runs are now triggered and monitored by Apache Airflow (instead of the cron job I had setup). This gives us a nice dashboard with the ability to view logs and see when benchmarks fail (and eventually email alerts). The dashboard isn't exposed publicly. If you have SSH access to the benchmark server, you can create a tunnel to port 8080
ssh -L 8080:localhost:8080 pandas@panda.likescandy.com
Tom
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
I added a page to the wiki: https://github.com/pandas-dev/pandas/wiki/Benchmark-Machine In theory, bootstrapping a new machine is as simple as an "ansible playbook tests/full.yml", but I've probably made some changes manually that aren't in the playbook. Agreed that we should get the system to be as stable as possible. https://haypo.github.io/category/benchmark.html has some useful information I think, starting with https://haypo.github.io/journey-to-stable-benchmark-system.html Tom On Tue, Aug 29, 2017 at 2:58 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
hey Tom,
This is really great. Any chance we can create a wiki or README about configuration in the event that the Airflow config needs to be recreated or changed?
Thanks again for setting this up, and keeping my coat closet (where the machine is located) toasty.
As one minor thing with the benchmarking, I've noticed that default Linux configs can be a little bit aggressive about throttling the CPU frequency. This can be edited in the cpufrequtils script, but at least on my laptop and desktop (Ubuntu 14.04) I find myself having to run "/etc/init.d/cpufrequtils restart" to get it to disable frequency scaling. This should probably happen at boot time, but I'm not sure yet how to do it. So we might want to document this so that we are getting the best quality performance data out of the machine.
- Wes
Hi everyone,
I did some work on the benchmark running & publishing yesterday. The results are now hosted at http://pandas.pydata.org/speed/, so pandas' are at http://pandas.pydata.org/speed/pandas. An RSS feed of regressions is available at http://pandas.pydata.org/speed/pandas/regressions.xml. I plan to track
and manually open issues if they seem legitimate.
The runs are now triggered and monitored by Apache Airflow (instead of
On Sun, Aug 20, 2017 at 8:21 AM, Tom Augspurger <tom.augspurger88@gmail.com> wrote: that the
cron job I had setup). This gives us a nice dashboard with the ability to view logs and see when benchmarks fail (and eventually email alerts). The dashboard isn't exposed publicly. If you have SSH access to the benchmark server, you can create a tunnel to port 8080
ssh -L 8080:localhost:8080 pandas@panda.likescandy.com
Tom
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Thanks! It can be hard to avoid "snowflake" setups, but when possible it's very nice. Conceivably in the future we could have other benchmark machines. On Wed, Sep 6, 2017 at 8:17 AM, Tom Augspurger <tom.augspurger88@gmail.com> wrote:
I added a page to the wiki: https://github.com/pandas-dev/pandas/wiki/Benchmark-Machine
In theory, bootstrapping a new machine is as simple as an "ansible playbook tests/full.yml", but I've probably made some changes manually that aren't in the playbook.
Agreed that we should get the system to be as stable as possible. https://haypo.github.io/category/benchmark.html has some useful information I think, starting with https://haypo.github.io/journey-to-stable-benchmark-system.html
Tom
On Tue, Aug 29, 2017 at 2:58 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
hey Tom,
This is really great. Any chance we can create a wiki or README about configuration in the event that the Airflow config needs to be recreated or changed?
Thanks again for setting this up, and keeping my coat closet (where the machine is located) toasty.
As one minor thing with the benchmarking, I've noticed that default Linux configs can be a little bit aggressive about throttling the CPU frequency. This can be edited in the cpufrequtils script, but at least on my laptop and desktop (Ubuntu 14.04) I find myself having to run "/etc/init.d/cpufrequtils restart" to get it to disable frequency scaling. This should probably happen at boot time, but I'm not sure yet how to do it. So we might want to document this so that we are getting the best quality performance data out of the machine.
- Wes
On Sun, Aug 20, 2017 at 8:21 AM, Tom Augspurger <tom.augspurger88@gmail.com> wrote:
Hi everyone,
I did some work on the benchmark running & publishing yesterday. The results are now hosted at http://pandas.pydata.org/speed/, so pandas' are at http://pandas.pydata.org/speed/pandas. An RSS feed of regressions is available at http://pandas.pydata.org/speed/pandas/regressions.xml. I plan to track that and manually open issues if they seem legitimate.
The runs are now triggered and monitored by Apache Airflow (instead of the cron job I had setup). This gives us a nice dashboard with the ability to view logs and see when benchmarks fail (and eventually email alerts). The dashboard isn't exposed publicly. If you have SSH access to the benchmark server, you can create a tunnel to port 8080
ssh -L 8080:localhost:8080 pandas@panda.likescandy.com
Tom
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
participants (2)
-
Tom Augspurger -
Wes McKinney