pandas new infrastructure (OVH donation)
Hi all, pandas has received a donation from OVHcloud <https://www.ovhcloud.com/> to support the project infrastructure, with OVHcloud public cloud credits (an initial amount of 10,000 EUR for a period of one year). OVH is open to sponsor longer term and also other projects of the ecosystem (or NumFOCUS as a whole), but we started with this to have feedback at a smaller scale first. The credits will be used initially for: - Hosting of the pandas website - Running the pandas benchmarks - Speeding up the project CI I detail next what I have in mind to set up for each. If anyone is interested in getting involved, or has ideas, comments... please let me know. I'll publish updates here as there is progress on this. Website: I'm planning to experiment on splitting the website in two (it'll be transparent for users). The website and the stable docs which receive most of the traffic can probably be stored in Cloudflare pages. We're already using Cloudflare as a CDN, so instead of using it as a cache, we can publish the documents there. The rest of the docs (old versions and the dev version) can be hosted in bucket storage of the OVHcloud. Response times may be a bit slower, but our website is bigger than the Cloudflare quota, and having old docs rarely accessed in a CDN seems unnecessary anyway. - Benchmarks: OVHcloud instances have guaranteed hardware, and we'll be checking if this is enough for the results of the benchmarks to be consistent over runs, or if there is too much variability and we need dedicated hardware. If consistency is good enough that would be great, since our benchmarks mostly use one core, and using dedicated hardware is likely to be a decent waste of resources, since most servers will likely have 16 cores or more. We'll discuss with OVH if dedicated hardware is needed, as at the moment their public cloud doesn't offer it (there is an alpha for providing dedicated instances, but we need to check with them). - Faster CI: Our GitHub runners are small, and most builds take around one hour or more to finish. We should be able to use bigger OVH instances for our existing CI pretty easily, via their OpenStack API and CIrun.
On Sat, 5 Nov 2022 at 15:24, Marc Garcia <garcia.marc@gmail.com> wrote:
Hi all,
pandas has received a donation from OVHcloud <https://www.ovhcloud.com/> to support the project infrastructure, with OVHcloud public cloud credits (an initial amount of 10,000 EUR for a period of one year). OVH is open to sponsor longer term and also other projects of the ecosystem (or NumFOCUS as a whole), but we started with this to have feedback at a smaller scale first.
The credits will be used initially for: - Hosting of the pandas website - Running the pandas benchmarks - Speeding up the project CI
I detail next what I have in mind to set up for each. If anyone is interested in getting involved, or has ideas, comments... please let me know. I'll publish updates here as there is progress on this.
Website: I'm planning to experiment on splitting the website in two (it'll be transparent for users). The website and the stable docs which receive most of the traffic can probably be stored in Cloudflare pages. We're already using Cloudflare as a CDN, so instead of using it as a cache, we can publish the documents there. The rest of the docs (old versions and the dev version) can be hosted in bucket storage of the OVHcloud. Response times may be a bit slower, but our website is bigger than the Cloudflare quota, and having old docs rarely accessed in a CDN seems unnecessary anyway.
Splitting like that makes sense! (_if_ it is within quota, we could maybe consider keeping the dev docs, and only move old docs to bucket storage?)
- Benchmarks: OVHcloud instances have guaranteed hardware, and we'll be checking if this is enough for the results of the benchmarks to be consistent over runs, or if there is too much variability and we need dedicated hardware. If consistency is good enough that would be great, since our benchmarks mostly use one core, and using dedicated hardware is likely to be a decent waste of resources, since most servers will likely have 16 cores or more. We'll discuss with OVH if dedicated hardware is needed, as at the moment their public cloud doesn't offer it (there is an alpha for providing dedicated instances, but we need to check with them).
- Faster CI: Our GitHub runners are small, and most builds take around one hour or more to finish. We should be able to use bigger OVH instances for our existing CI pretty easily, via their OpenStack API and CIrun.
I am not familiar with CIrun, but quickly checking it, that would basically be using our current github actions but through their "self-hosted" runner feature?
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Some updates (the ones shared in yesterday's call, and some new ones. The cloud (bucket) storage didn't seem convenient for different reasons, so I moved forward with a regular Ubuntu instance (the cheapest, 2 cores, 7Gb ram, 24 EUR/month). I moved now all the traffic to the new instance, and since we've just got static file serving, the instance seems to be more than enough to handle our traffic (I didn't see CPU or RAM exceed 4% usage in the time I've been monitoring the resources). I've got a PR open (#49614) to start syncing our web/docs with the new server. In few hours I'll stop the nginx in the old server (I confirmed there is no traffic already, since we use cloudflare our dns changes are immediate). And in few days I'll switch off the instance in rackspace. Besides the open PR, the only missing thing are the benchmarks at ( pandas.pydata.org/speed). The link is not working now, since I didn't move the benchmarks yet. But before moving this, we should also make the changes in the benchmarks repo, so benchmark results start to synchronize with the new server. Can someone with access to the server take care of it please (DM for the new server info). On running the benchmarks in OVH, the VM instances don't seem to be stable enough to keep track of performance over time, as it was likely. Full results of the tests I did are in this repo: https://gitlab.com/datapythonista/pandas_ovh_benchmarks . OVH is checking the best way to give us access to dedicated hardware, will continue with that once we've got it. In parallel to that, I'm planning to do some tests to see if it could be feasible to use valgrind's cachegrind (or equivalent) to instead of monitor time, we monitor CPU cycles. That should make benchmarking much easier and faster, as any hardware would work, and benchmarks could be run in parallel. With a dedicated server we're likely to only be able to use a single core to have stable results, which means that we can only run one benchmark suite per server every 3 hours. But implementing it can be tricky. About CIrun, as you say Joris, it's like a middle man between our hardware (the OVH openstack API to create/delete instances) and GitHub actions. We need to add an extra yaml file with the CIrun configuration, and other than that we should be able to use OVH hardware directly from our current CI jobs without changes (except one entry to say what instance we want to use for the jobs running in OVH I assume). Please let me know of any feedback. In particular if you see any problem with our website that could be caused by the migration. Cheers, On Thu, Nov 10, 2022 at 12:43 AM Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
On Sat, 5 Nov 2022 at 15:24, Marc Garcia <garcia.marc@gmail.com> wrote:
Hi all,
pandas has received a donation from OVHcloud <https://www.ovhcloud.com/> to support the project infrastructure, with OVHcloud public cloud credits (an initial amount of 10,000 EUR for a period of one year). OVH is open to sponsor longer term and also other projects of the ecosystem (or NumFOCUS as a whole), but we started with this to have feedback at a smaller scale first.
The credits will be used initially for: - Hosting of the pandas website - Running the pandas benchmarks - Speeding up the project CI
I detail next what I have in mind to set up for each. If anyone is interested in getting involved, or has ideas, comments... please let me know. I'll publish updates here as there is progress on this.
Website: I'm planning to experiment on splitting the website in two (it'll be transparent for users). The website and the stable docs which receive most of the traffic can probably be stored in Cloudflare pages. We're already using Cloudflare as a CDN, so instead of using it as a cache, we can publish the documents there. The rest of the docs (old versions and the dev version) can be hosted in bucket storage of the OVHcloud. Response times may be a bit slower, but our website is bigger than the Cloudflare quota, and having old docs rarely accessed in a CDN seems unnecessary anyway.
Splitting like that makes sense! (_if_ it is within quota, we could maybe consider keeping the dev docs, and only move old docs to bucket storage?)
- Benchmarks: OVHcloud instances have guaranteed hardware, and we'll be checking if this is enough for the results of the benchmarks to be consistent over runs, or if there is too much variability and we need dedicated hardware. If consistency is good enough that would be great, since our benchmarks mostly use one core, and using dedicated hardware is likely to be a decent waste of resources, since most servers will likely have 16 cores or more. We'll discuss with OVH if dedicated hardware is needed, as at the moment their public cloud doesn't offer it (there is an alpha for providing dedicated instances, but we need to check with them).
- Faster CI: Our GitHub runners are small, and most builds take around one hour or more to finish. We should be able to use bigger OVH instances for our existing CI pretty easily, via their OpenStack API and CIrun.
I am not familiar with CIrun, but quickly checking it, that would basically be using our current github actions but through their "self-hosted" runner feature?
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Besides the open PR, the only missing thing are the benchmarks at ( pandas.pydata.org/speed). The link is not working now, since I didn't move the benchmarks yet. But before moving this, we should also make the changes in the benchmarks repo, so benchmark results start to synchronize with the new server. Can someone with access to the server take care of it please (DM for the new server info).
The link https://asv-runner.github.io/asv-collection/pandas/ is being automatically updated. Can we point to this URL for now, given that we may be changing how the benchmarks are run? If it's desirable to have the benchmarks results on the docs server and our current solution is deemed to be the long term one, I can work on the synchronization. However I'm resistant to putting in that work if it's just going to go away given the easier solution. Best, Richard On Wed, Nov 9, 2022 at 11:50 PM Marc Garcia <garcia.marc@gmail.com> wrote:
Some updates (the ones shared in yesterday's call, and some new ones.
The cloud (bucket) storage didn't seem convenient for different reasons, so I moved forward with a regular Ubuntu instance (the cheapest, 2 cores, 7Gb ram, 24 EUR/month). I moved now all the traffic to the new instance, and since we've just got static file serving, the instance seems to be more than enough to handle our traffic (I didn't see CPU or RAM exceed 4% usage in the time I've been monitoring the resources). I've got a PR open (#49614) to start syncing our web/docs with the new server. In few hours I'll stop the nginx in the old server (I confirmed there is no traffic already, since we use cloudflare our dns changes are immediate). And in few days I'll switch off the instance in rackspace.
Besides the open PR, the only missing thing are the benchmarks at ( pandas.pydata.org/speed). The link is not working now, since I didn't move the benchmarks yet. But before moving this, we should also make the changes in the benchmarks repo, so benchmark results start to synchronize with the new server. Can someone with access to the server take care of it please (DM for the new server info).
On running the benchmarks in OVH, the VM instances don't seem to be stable enough to keep track of performance over time, as it was likely. Full results of the tests I did are in this repo: https://gitlab.com/datapythonista/pandas_ovh_benchmarks . OVH is checking the best way to give us access to dedicated hardware, will continue with that once we've got it. In parallel to that, I'm planning to do some tests to see if it could be feasible to use valgrind's cachegrind (or equivalent) to instead of monitor time, we monitor CPU cycles. That should make benchmarking much easier and faster, as any hardware would work, and benchmarks could be run in parallel. With a dedicated server we're likely to only be able to use a single core to have stable results, which means that we can only run one benchmark suite per server every 3 hours. But implementing it can be tricky.
About CIrun, as you say Joris, it's like a middle man between our hardware (the OVH openstack API to create/delete instances) and GitHub actions. We need to add an extra yaml file with the CIrun configuration, and other than that we should be able to use OVH hardware directly from our current CI jobs without changes (except one entry to say what instance we want to use for the jobs running in OVH I assume).
Please let me know of any feedback. In particular if you see any problem with our website that could be caused by the migration.
Cheers,
On Thu, Nov 10, 2022 at 12:43 AM Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
On Sat, 5 Nov 2022 at 15:24, Marc Garcia <garcia.marc@gmail.com> wrote:
Hi all,
pandas has received a donation from OVHcloud <https://www.ovhcloud.com/> to support the project infrastructure, with OVHcloud public cloud credits (an initial amount of 10,000 EUR for a period of one year). OVH is open to sponsor longer term and also other projects of the ecosystem (or NumFOCUS as a whole), but we started with this to have feedback at a smaller scale first.
The credits will be used initially for: - Hosting of the pandas website - Running the pandas benchmarks - Speeding up the project CI
I detail next what I have in mind to set up for each. If anyone is interested in getting involved, or has ideas, comments... please let me know. I'll publish updates here as there is progress on this.
Website: I'm planning to experiment on splitting the website in two (it'll be transparent for users). The website and the stable docs which receive most of the traffic can probably be stored in Cloudflare pages. We're already using Cloudflare as a CDN, so instead of using it as a cache, we can publish the documents there. The rest of the docs (old versions and the dev version) can be hosted in bucket storage of the OVHcloud. Response times may be a bit slower, but our website is bigger than the Cloudflare quota, and having old docs rarely accessed in a CDN seems unnecessary anyway.
Splitting like that makes sense! (_if_ it is within quota, we could maybe consider keeping the dev docs, and only move old docs to bucket storage?)
- Benchmarks: OVHcloud instances have guaranteed hardware, and we'll be checking if this is enough for the results of the benchmarks to be consistent over runs, or if there is too much variability and we need dedicated hardware. If consistency is good enough that would be great, since our benchmarks mostly use one core, and using dedicated hardware is likely to be a decent waste of resources, since most servers will likely have 16 cores or more. We'll discuss with OVH if dedicated hardware is needed, as at the moment their public cloud doesn't offer it (there is an alpha for providing dedicated instances, but we need to check with them).
- Faster CI: Our GitHub runners are small, and most builds take around one hour or more to finish. We should be able to use bigger OVH instances for our existing CI pretty easily, via their OpenStack API and CIrun.
I am not familiar with CIrun, but quickly checking it, that would basically be using our current github actions but through their "self-hosted" runner feature?
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Oh, I forgot we were not using the rendered asv website from the old server. We're using nginx, so I can easily make pandas.pydata.org/speed show the content from that url. But I guess we can also check them directly in the github pages url, not sure if it makes a difference. Let me know if it's useful, and I'll set it up. Thanks for the info! On Thu, Nov 10, 2022, 21:18 Richard Shadrach <rhshadrach@gmail.com> wrote:
Besides the open PR, the only missing thing are the benchmarks at ( pandas.pydata.org/speed). The link is not working now, since I didn't move the benchmarks yet. But before moving this, we should also make the changes in the benchmarks repo, so benchmark results start to synchronize with the new server. Can someone with access to the server take care of it please (DM for the new server info).
The link https://asv-runner.github.io/asv-collection/pandas/ is being automatically updated. Can we point to this URL for now, given that we may be changing how the benchmarks are run? If it's desirable to have the benchmarks results on the docs server and our current solution is deemed to be the long term one, I can work on the synchronization. However I'm resistant to putting in that work if it's just going to go away given the easier solution.
Best, Richard
On Wed, Nov 9, 2022 at 11:50 PM Marc Garcia <garcia.marc@gmail.com> wrote:
Some updates (the ones shared in yesterday's call, and some new ones.
The cloud (bucket) storage didn't seem convenient for different reasons, so I moved forward with a regular Ubuntu instance (the cheapest, 2 cores, 7Gb ram, 24 EUR/month). I moved now all the traffic to the new instance, and since we've just got static file serving, the instance seems to be more than enough to handle our traffic (I didn't see CPU or RAM exceed 4% usage in the time I've been monitoring the resources). I've got a PR open (#49614) to start syncing our web/docs with the new server. In few hours I'll stop the nginx in the old server (I confirmed there is no traffic already, since we use cloudflare our dns changes are immediate). And in few days I'll switch off the instance in rackspace.
Besides the open PR, the only missing thing are the benchmarks at ( pandas.pydata.org/speed). The link is not working now, since I didn't move the benchmarks yet. But before moving this, we should also make the changes in the benchmarks repo, so benchmark results start to synchronize with the new server. Can someone with access to the server take care of it please (DM for the new server info).
On running the benchmarks in OVH, the VM instances don't seem to be stable enough to keep track of performance over time, as it was likely. Full results of the tests I did are in this repo: https://gitlab.com/datapythonista/pandas_ovh_benchmarks . OVH is checking the best way to give us access to dedicated hardware, will continue with that once we've got it. In parallel to that, I'm planning to do some tests to see if it could be feasible to use valgrind's cachegrind (or equivalent) to instead of monitor time, we monitor CPU cycles. That should make benchmarking much easier and faster, as any hardware would work, and benchmarks could be run in parallel. With a dedicated server we're likely to only be able to use a single core to have stable results, which means that we can only run one benchmark suite per server every 3 hours. But implementing it can be tricky.
About CIrun, as you say Joris, it's like a middle man between our hardware (the OVH openstack API to create/delete instances) and GitHub actions. We need to add an extra yaml file with the CIrun configuration, and other than that we should be able to use OVH hardware directly from our current CI jobs without changes (except one entry to say what instance we want to use for the jobs running in OVH I assume).
Please let me know of any feedback. In particular if you see any problem with our website that could be caused by the migration.
Cheers,
On Thu, Nov 10, 2022 at 12:43 AM Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
On Sat, 5 Nov 2022 at 15:24, Marc Garcia <garcia.marc@gmail.com> wrote:
Hi all,
pandas has received a donation from OVHcloud <https://www.ovhcloud.com/> to support the project infrastructure, with OVHcloud public cloud credits (an initial amount of 10,000 EUR for a period of one year). OVH is open to sponsor longer term and also other projects of the ecosystem (or NumFOCUS as a whole), but we started with this to have feedback at a smaller scale first.
The credits will be used initially for: - Hosting of the pandas website - Running the pandas benchmarks - Speeding up the project CI
I detail next what I have in mind to set up for each. If anyone is interested in getting involved, or has ideas, comments... please let me know. I'll publish updates here as there is progress on this.
Website: I'm planning to experiment on splitting the website in two (it'll be transparent for users). The website and the stable docs which receive most of the traffic can probably be stored in Cloudflare pages. We're already using Cloudflare as a CDN, so instead of using it as a cache, we can publish the documents there. The rest of the docs (old versions and the dev version) can be hosted in bucket storage of the OVHcloud. Response times may be a bit slower, but our website is bigger than the Cloudflare quota, and having old docs rarely accessed in a CDN seems unnecessary anyway.
Splitting like that makes sense! (_if_ it is within quota, we could maybe consider keeping the dev docs, and only move old docs to bucket storage?)
- Benchmarks: OVHcloud instances have guaranteed hardware, and we'll be checking if this is enough for the results of the benchmarks to be consistent over runs, or if there is too much variability and we need dedicated hardware. If consistency is good enough that would be great, since our benchmarks mostly use one core, and using dedicated hardware is likely to be a decent waste of resources, since most servers will likely have 16 cores or more. We'll discuss with OVH if dedicated hardware is needed, as at the moment their public cloud doesn't offer it (there is an alpha for providing dedicated instances, but we need to check with them).
- Faster CI: Our GitHub runners are small, and most builds take around one hour or more to finish. We should be able to use bigger OVH instances for our existing CI pretty easily, via their OpenStack API and CIrun.
I am not familiar with CIrun, but quickly checking it, that would basically be using our current github actions but through their "self-hosted" runner feature?
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Quick update about the new infrastructure. - New hosting for the website seems to be working just fine, no issues detected. I just stopped nginx in the old server, in case there is anything there still being used we hopefully realize. But if there are no issues and no objections, I'll be switching off the server in few days. - We should be able to start using dedicated hardware for the benchmarks from our OVH cloud account in December. It'll work as regular cloud instances, but with dedicated servers. We'll be doing some tests to try to get more stability in the benchmarks, and hopefully we can get something even better than until now when the OVH hardware is ready. On Thu, Nov 10, 2022 at 9:31 PM Marc Garcia <garcia.marc@gmail.com> wrote:
Oh, I forgot we were not using the rendered asv website from the old server. We're using nginx, so I can easily make pandas.pydata.org/speed show the content from that url. But I guess we can also check them directly in the github pages url, not sure if it makes a difference.
Let me know if it's useful, and I'll set it up. Thanks for the info!
On Thu, Nov 10, 2022, 21:18 Richard Shadrach <rhshadrach@gmail.com> wrote:
Besides the open PR, the only missing thing are the benchmarks at ( pandas.pydata.org/speed). The link is not working now, since I didn't move the benchmarks yet. But before moving this, we should also make the changes in the benchmarks repo, so benchmark results start to synchronize with the new server. Can someone with access to the server take care of it please (DM for the new server info).
The link https://asv-runner.github.io/asv-collection/pandas/ is being automatically updated. Can we point to this URL for now, given that we may be changing how the benchmarks are run? If it's desirable to have the benchmarks results on the docs server and our current solution is deemed to be the long term one, I can work on the synchronization. However I'm resistant to putting in that work if it's just going to go away given the easier solution.
Best, Richard
On Wed, Nov 9, 2022 at 11:50 PM Marc Garcia <garcia.marc@gmail.com> wrote:
Some updates (the ones shared in yesterday's call, and some new ones.
The cloud (bucket) storage didn't seem convenient for different reasons, so I moved forward with a regular Ubuntu instance (the cheapest, 2 cores, 7Gb ram, 24 EUR/month). I moved now all the traffic to the new instance, and since we've just got static file serving, the instance seems to be more than enough to handle our traffic (I didn't see CPU or RAM exceed 4% usage in the time I've been monitoring the resources). I've got a PR open (#49614) to start syncing our web/docs with the new server. In few hours I'll stop the nginx in the old server (I confirmed there is no traffic already, since we use cloudflare our dns changes are immediate). And in few days I'll switch off the instance in rackspace.
Besides the open PR, the only missing thing are the benchmarks at ( pandas.pydata.org/speed). The link is not working now, since I didn't move the benchmarks yet. But before moving this, we should also make the changes in the benchmarks repo, so benchmark results start to synchronize with the new server. Can someone with access to the server take care of it please (DM for the new server info).
On running the benchmarks in OVH, the VM instances don't seem to be stable enough to keep track of performance over time, as it was likely. Full results of the tests I did are in this repo: https://gitlab.com/datapythonista/pandas_ovh_benchmarks . OVH is checking the best way to give us access to dedicated hardware, will continue with that once we've got it. In parallel to that, I'm planning to do some tests to see if it could be feasible to use valgrind's cachegrind (or equivalent) to instead of monitor time, we monitor CPU cycles. That should make benchmarking much easier and faster, as any hardware would work, and benchmarks could be run in parallel. With a dedicated server we're likely to only be able to use a single core to have stable results, which means that we can only run one benchmark suite per server every 3 hours. But implementing it can be tricky.
About CIrun, as you say Joris, it's like a middle man between our hardware (the OVH openstack API to create/delete instances) and GitHub actions. We need to add an extra yaml file with the CIrun configuration, and other than that we should be able to use OVH hardware directly from our current CI jobs without changes (except one entry to say what instance we want to use for the jobs running in OVH I assume).
Please let me know of any feedback. In particular if you see any problem with our website that could be caused by the migration.
Cheers,
On Thu, Nov 10, 2022 at 12:43 AM Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
On Sat, 5 Nov 2022 at 15:24, Marc Garcia <garcia.marc@gmail.com> wrote:
Hi all,
pandas has received a donation from OVHcloud <https://www.ovhcloud.com/> to support the project infrastructure, with OVHcloud public cloud credits (an initial amount of 10,000 EUR for a period of one year). OVH is open to sponsor longer term and also other projects of the ecosystem (or NumFOCUS as a whole), but we started with this to have feedback at a smaller scale first.
The credits will be used initially for: - Hosting of the pandas website - Running the pandas benchmarks - Speeding up the project CI
I detail next what I have in mind to set up for each. If anyone is interested in getting involved, or has ideas, comments... please let me know. I'll publish updates here as there is progress on this.
Website: I'm planning to experiment on splitting the website in two (it'll be transparent for users). The website and the stable docs which receive most of the traffic can probably be stored in Cloudflare pages. We're already using Cloudflare as a CDN, so instead of using it as a cache, we can publish the documents there. The rest of the docs (old versions and the dev version) can be hosted in bucket storage of the OVHcloud. Response times may be a bit slower, but our website is bigger than the Cloudflare quota, and having old docs rarely accessed in a CDN seems unnecessary anyway.
Splitting like that makes sense! (_if_ it is within quota, we could maybe consider keeping the dev docs, and only move old docs to bucket storage?)
- Benchmarks: OVHcloud instances have guaranteed hardware, and we'll be checking if this is enough for the results of the benchmarks to be consistent over runs, or if there is too much variability and we need dedicated hardware. If consistency is good enough that would be great, since our benchmarks mostly use one core, and using dedicated hardware is likely to be a decent waste of resources, since most servers will likely have 16 cores or more. We'll discuss with OVH if dedicated hardware is needed, as at the moment their public cloud doesn't offer it (there is an alpha for providing dedicated instances, but we need to check with them).
- Faster CI: Our GitHub runners are small, and most builds take around one hour or more to finish. We should be able to use bigger OVH instances for our existing CI pretty easily, via their OpenStack API and CIrun.
I am not familiar with CIrun, but quickly checking it, that would basically be using our current github actions but through their "self-hosted" runner feature?
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Hi all, an update on this. OVH added dedicated hardware to their public cloud, and we can now create metal instances on demand. I created one, I isolated a CPU, and I'm running our benchmark suite on one it for the last 50 commits of the project. Each run takes 2 hours in the cheapest dedicated server, which costs 170 EUR per month. I don't think more expensive instances will make a difference, since they seem to have more cores, but not sure if a single core (what we're using) would be faster. If everything looks good when the current run finishes, I'll set things up so benchmarks start running automatically in that machine, and results are published. In parallel, I'm doing some research and some tests to see if we can detect performance regressions in PRs. I don't think it makes much sense to do it with asv, but I think it's feasible to build something specific. I'll send an update with the results of the research and a proposal when I'm ready. Any feedback or whatever just let me know. On Tue, Nov 15, 2022 at 3:05 PM Marc Garcia <garcia.marc@gmail.com> wrote:
Quick update about the new infrastructure.
- New hosting for the website seems to be working just fine, no issues detected. I just stopped nginx in the old server, in case there is anything there still being used we hopefully realize. But if there are no issues and no objections, I'll be switching off the server in few days.
- We should be able to start using dedicated hardware for the benchmarks from our OVH cloud account in December. It'll work as regular cloud instances, but with dedicated servers. We'll be doing some tests to try to get more stability in the benchmarks, and hopefully we can get something even better than until now when the OVH hardware is ready.
On Thu, Nov 10, 2022 at 9:31 PM Marc Garcia <garcia.marc@gmail.com> wrote:
Oh, I forgot we were not using the rendered asv website from the old server. We're using nginx, so I can easily make pandas.pydata.org/speed show the content from that url. But I guess we can also check them directly in the github pages url, not sure if it makes a difference.
Let me know if it's useful, and I'll set it up. Thanks for the info!
On Thu, Nov 10, 2022, 21:18 Richard Shadrach <rhshadrach@gmail.com> wrote:
Besides the open PR, the only missing thing are the benchmarks at ( pandas.pydata.org/speed). The link is not working now, since I didn't move the benchmarks yet. But before moving this, we should also make the changes in the benchmarks repo, so benchmark results start to synchronize with the new server. Can someone with access to the server take care of it please (DM for the new server info).
The link https://asv-runner.github.io/asv-collection/pandas/ is being automatically updated. Can we point to this URL for now, given that we may be changing how the benchmarks are run? If it's desirable to have the benchmarks results on the docs server and our current solution is deemed to be the long term one, I can work on the synchronization. However I'm resistant to putting in that work if it's just going to go away given the easier solution.
Best, Richard
On Wed, Nov 9, 2022 at 11:50 PM Marc Garcia <garcia.marc@gmail.com> wrote:
Some updates (the ones shared in yesterday's call, and some new ones.
The cloud (bucket) storage didn't seem convenient for different reasons, so I moved forward with a regular Ubuntu instance (the cheapest, 2 cores, 7Gb ram, 24 EUR/month). I moved now all the traffic to the new instance, and since we've just got static file serving, the instance seems to be more than enough to handle our traffic (I didn't see CPU or RAM exceed 4% usage in the time I've been monitoring the resources). I've got a PR open (#49614) to start syncing our web/docs with the new server. In few hours I'll stop the nginx in the old server (I confirmed there is no traffic already, since we use cloudflare our dns changes are immediate). And in few days I'll switch off the instance in rackspace.
Besides the open PR, the only missing thing are the benchmarks at ( pandas.pydata.org/speed). The link is not working now, since I didn't move the benchmarks yet. But before moving this, we should also make the changes in the benchmarks repo, so benchmark results start to synchronize with the new server. Can someone with access to the server take care of it please (DM for the new server info).
On running the benchmarks in OVH, the VM instances don't seem to be stable enough to keep track of performance over time, as it was likely. Full results of the tests I did are in this repo: https://gitlab.com/datapythonista/pandas_ovh_benchmarks . OVH is checking the best way to give us access to dedicated hardware, will continue with that once we've got it. In parallel to that, I'm planning to do some tests to see if it could be feasible to use valgrind's cachegrind (or equivalent) to instead of monitor time, we monitor CPU cycles. That should make benchmarking much easier and faster, as any hardware would work, and benchmarks could be run in parallel. With a dedicated server we're likely to only be able to use a single core to have stable results, which means that we can only run one benchmark suite per server every 3 hours. But implementing it can be tricky.
About CIrun, as you say Joris, it's like a middle man between our hardware (the OVH openstack API to create/delete instances) and GitHub actions. We need to add an extra yaml file with the CIrun configuration, and other than that we should be able to use OVH hardware directly from our current CI jobs without changes (except one entry to say what instance we want to use for the jobs running in OVH I assume).
Please let me know of any feedback. In particular if you see any problem with our website that could be caused by the migration.
Cheers,
On Thu, Nov 10, 2022 at 12:43 AM Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
On Sat, 5 Nov 2022 at 15:24, Marc Garcia <garcia.marc@gmail.com> wrote:
Hi all,
pandas has received a donation from OVHcloud <https://www.ovhcloud.com/> to support the project infrastructure, with OVHcloud public cloud credits (an initial amount of 10,000 EUR for a period of one year). OVH is open to sponsor longer term and also other projects of the ecosystem (or NumFOCUS as a whole), but we started with this to have feedback at a smaller scale first.
The credits will be used initially for: - Hosting of the pandas website - Running the pandas benchmarks - Speeding up the project CI
I detail next what I have in mind to set up for each. If anyone is interested in getting involved, or has ideas, comments... please let me know. I'll publish updates here as there is progress on this.
Website: I'm planning to experiment on splitting the website in two (it'll be transparent for users). The website and the stable docs which receive most of the traffic can probably be stored in Cloudflare pages. We're already using Cloudflare as a CDN, so instead of using it as a cache, we can publish the documents there. The rest of the docs (old versions and the dev version) can be hosted in bucket storage of the OVHcloud. Response times may be a bit slower, but our website is bigger than the Cloudflare quota, and having old docs rarely accessed in a CDN seems unnecessary anyway.
Splitting like that makes sense! (_if_ it is within quota, we could maybe consider keeping the dev docs, and only move old docs to bucket storage?)
- Benchmarks: OVHcloud instances have guaranteed hardware, and we'll be checking if this is enough for the results of the benchmarks to be consistent over runs, or if there is too much variability and we need dedicated hardware. If consistency is good enough that would be great, since our benchmarks mostly use one core, and using dedicated hardware is likely to be a decent waste of resources, since most servers will likely have 16 cores or more. We'll discuss with OVH if dedicated hardware is needed, as at the moment their public cloud doesn't offer it (there is an alpha for providing dedicated instances, but we need to check with them).
- Faster CI: Our GitHub runners are small, and most builds take around one hour or more to finish. We should be able to use bigger OVH instances for our existing CI pretty easily, via their OpenStack API and CIrun.
I am not familiar with CIrun, but quickly checking it, that would basically be using our current github actions but through their "self-hosted" runner feature?
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Hi Marc, Following up on this: can you confirm that we aren't using the rackspace server anymore? And were you able to shut it down? Andy Terrel (cc'd) was wondering if we used it anymore and I let me know that it probably wasn't, but that you might have a better idea of the current state of things. Thanks, Tom
To the best of my knowledge we are not using the rackspace server anymore. All I'm aware of regarding pandas infrastructure is on OVH and updated automatically from GitHub. I'm not sure if the Rackspace pandas server still exists. I guess it does if we're having this conversation. ;) I forgot the exact details, but for what I remember we kept a shared spreadsheet with the state of all the NumFOCUS servers. They couldn't be cancelled online, and we had to call Rackspace. I think Andy did it once for some servers, but not sure if we had more ready to be cancelled after that, that we never cancelled. For context, the people at NumFOCUS wasn't helping much with the job I was doing trying to lower the infrastructure bill, and they reverted many things I did related to the PyData websites where there was the biggest mess and highest cost. I decided to stop doing any work on the NumFOCUS infrastructure after that. So it's very possible that some things were left half way, and that the old pandas server (and others) weren't cancelled after we migrated out of them. I don't remember the details, but the spreadsheet will likely have updated information on the state of each individual server. Andy, if you don't find the spreadsheet let me know. I guess I can still find it. Cheers, On Fri, Sep 20, 2024, 23:31 Tom Augspurger <tom.w.augspurger@gmail.com> wrote:
Hi Marc,
Following up on this: can you confirm that we aren't using the rackspace server anymore? And were you able to shut it down? Andy Terrel (cc'd) was wondering if we used it anymore and I let me know that it probably wasn't, but that you might have a better idea of the current state of things.
Thanks,
Tom
Yeah the rack space server is still up. I'll remove networking to see if anyone screams. I will dig around for the spreadsheet. I remember seeing it but I only have two servers now that aren't managed by Pydata admins. So probably don't need it much. -- Andy On Fri, Sep 20, 2024 at 10:48 AM Marc Garcia <garcia.marc@gmail.com> wrote:
To the best of my knowledge we are not using the rackspace server anymore. All I'm aware of regarding pandas infrastructure is on OVH and updated automatically from GitHub.
I'm not sure if the Rackspace pandas server still exists. I guess it does if we're having this conversation. ;)
I forgot the exact details, but for what I remember we kept a shared spreadsheet with the state of all the NumFOCUS servers. They couldn't be cancelled online, and we had to call Rackspace. I think Andy did it once for some servers, but not sure if we had more ready to be cancelled after that, that we never cancelled.
For context, the people at NumFOCUS wasn't helping much with the job I was doing trying to lower the infrastructure bill, and they reverted many things I did related to the PyData websites where there was the biggest mess and highest cost. I decided to stop doing any work on the NumFOCUS infrastructure after that. So it's very possible that some things were left half way, and that the old pandas server (and others) weren't cancelled after we migrated out of them. I don't remember the details, but the spreadsheet will likely have updated information on the state of each individual server.
Andy, if you don't find the spreadsheet let me know. I guess I can still find it.
Cheers,
On Fri, Sep 20, 2024, 23:31 Tom Augspurger <tom.w.augspurger@gmail.com> wrote:
Hi Marc,
Following up on this: can you confirm that we aren't using the rackspace server anymore? And were you able to shut it down? Andy Terrel (cc'd) was wondering if we used it anymore and I let me know that it probably wasn't, but that you might have a better idea of the current state of things.
Thanks,
Tom
participants (5)
-
Andy Terrel -
Joris Van den Bossche -
Marc Garcia -
Richard Shadrach -
Tom Augspurger