PyPI Migrated to New Infrastructure with some Breakage
Today (Sat Jan 25, 2014) the Infrastructure team has migrated PyPI to new infrastructure. The old infrastructure was: - a single database server managed by OSUOSL - a pair of load balancers shared by all of the python.org services hosted on OSUOSL - a single backend VM that served as everything else for PyPI The VM that was acting as the backend server from PyPI was partially hand configured and partially setup to be managed by chef. Additionally it had an issue that caused it to kernel panic every so often which had been the cause of a number of downtimes in the last few months. Because it was primarily configured and administered by hand and because the way it was set up it was not feasible to have any sort of failover or spare. The new infrastructure is: - 2 Web VMs - 2 Database servers in Master/Slave Configuration - 2 PgPool Servers pooling connections to the database servers and load balancing reads across them. - 2 GlusterFS servers backed by Cloud Block Storage acting as the file storage for package and package docs - 1 metrics server to handle updating the download counts as they come in from Fastly All of the VMs are hosted on Rackspace’s Public Cloud and have their configuration completely controlled and managed using Salt. Going forward this will allow us to easily scale out as required or kill malfunctioning servers and spin up new ones easily. Additionally the setup has been setup so that where possible there is two servers performing the same role, ideally in an Active/Active configuration but at least in a Master/Slave configuration. This should allow PyPI to be far more stable moving forward and make downtimes much easier to recover from. The services are still fronted by Fasty’s CDN and in the new infrastructure we’ve removed our load balancer and have replaced it by having Fastly handle the load balancing for us. Additionally we’ve recently setup a static mirror of PyPI that is updated once every minute. This is hosted on Rackspace cloud as well but in a separate data center from the rest of PyPI. Fastly is configured to fall back to this static mirror in the case that neither of the two web heads are functioning. This should ensure that even in the event of a catastrophic failure of the PyPI service that the bulk of package installations should hopefully remain working. The bad news, (and the “Breakage” from the subject) is that while the new infrastructure was being planned out, built, and migrated to the “pypissh” package was forgotten. The pypissh package is an alternative way to upload packages to PyPI however it is very difficult, because of the way it works, to provide HA support for it as we’ve set up for everything else. We don’t have any numbers for how many people are actively using this package but looking at a roughly 2 week chunk of time in PyPI’s download history, the pypissh package was downloaded 7 times by a browser, and 7 times by pip. All other downloads were caused by the mirroring system. As of right now pypissh is non functional and due to the difficulty in HAing and monitoring the current setup and because it is apparently has a very small set of users we would like to effectively kill off this particular service. Additionally the benefits of pypissh have been reduced now that PyPI is available over a TLS connection with a well trusted certificate. My question to you is, is this something that distutils-sig is willing to have happen? If we are to re-enable pypissh we’ll need to write a new solution to doing it that can be properly HA’d and we’d prefer to put our efforts into improving things for a much larger set of people. So yea, PyPI should be loads more stable and more reliable now. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Thanks everyone who helped make this happen.
From my perspective* I believe the ssh upload mechanism was added to address security issues around the basic-auth-over-http method used historically. Now uploads *may* be done over https, and those using the ssh method can move over to using twine or pip upload, I think that it's reasonable to discontinue support for ssh uploads.
Richard * as the guy who will be hassled if its loss is noticed ;) On 26 January 2014 10:38, Donald Stufft <donald@stufft.io> wrote:
Today (Sat Jan 25, 2014) the Infrastructure team has migrated PyPI to new infrastructure.
The old infrastructure was:
- a single database server managed by OSUOSL - a pair of load balancers shared by all of the python.org services hosted on OSUOSL - a single backend VM that served as everything else for PyPI
The VM that was acting as the backend server from PyPI was partially hand configured and partially setup to be managed by chef. Additionally it had an issue that caused it to kernel panic every so often which had been the cause of a number of downtimes in the last few months. Because it was primarily configured and administered by hand and because the way it was set up it was not feasible to have any sort of failover or spare.
The new infrastructure is:
- 2 Web VMs - 2 Database servers in Master/Slave Configuration - 2 PgPool Servers pooling connections to the database servers and load balancing reads across them. - 2 GlusterFS servers backed by Cloud Block Storage acting as the file storage for package and package docs - 1 metrics server to handle updating the download counts as they come in from Fastly
All of the VMs are hosted on Rackspace’s Public Cloud and have their configuration completely controlled and managed using Salt. Going forward this will allow us to easily scale out as required or kill malfunctioning servers and spin up new ones easily. Additionally the setup has been setup so that where possible there is two servers performing the same role, ideally in an Active/Active configuration but at least in a Master/Slave configuration. This should allow PyPI to be far more stable moving forward and make downtimes much easier to recover from.
The services are still fronted by Fasty’s CDN and in the new infrastructure we’ve removed our load balancer and have replaced it by having Fastly handle the load balancing for us. Additionally we’ve recently setup a static mirror of PyPI that is updated once every minute. This is hosted on Rackspace cloud as well but in a separate data center from the rest of PyPI. Fastly is configured to fall back to this static mirror in the case that neither of the two web heads are functioning. This should ensure that even in the event of a catastrophic failure of the PyPI service that the bulk of package installations should hopefully remain working.
The bad news, (and the “Breakage” from the subject) is that while the new infrastructure was being planned out, built, and migrated to the “pypissh” package was forgotten. The pypissh package is an alternative way to upload packages to PyPI however it is very difficult, because of the way it works, to provide HA support for it as we’ve set up for everything else. We don’t have any numbers for how many people are actively using this package but looking at a roughly 2 week chunk of time in PyPI’s download history, the pypissh package was downloaded 7 times by a browser, and 7 times by pip. All other downloads were caused by the mirroring system.
As of right now pypissh is non functional and due to the difficulty in HAing and monitoring the current setup and because it is apparently has a very small set of users we would like to effectively kill off this particular service. Additionally the benefits of pypissh have been reduced now that PyPI is available over a TLS connection with a well trusted certificate. My question to you is, is this something that distutils-sig is willing to have happen? If we are to re-enable pypissh we’ll need to write a new solution to doing it that can be properly HA’d and we’d prefer to put our efforts into improving things for a much larger set of people.
So yea, PyPI should be loads more stable and more reliable now.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On 26 Jan 2014 09:51, "Richard Jones" <richard@python.org> wrote:
Thanks everyone who helped make this happen.
Indeed - fine work! :)
From my perspective* I believe the ssh upload mechanism was added to address security issues around the basic-auth-over-http method used historically. Now uploads *may* be done over https, and those using the ssh method can move over to using twine or pip upload, I think that it's reasonable to discontinue support for ssh uploads.
Yes, I agree that pointing the (very few) pypissh users towards twine as a replacement is the most reasonable option at this point - we should chat to MvL about putting a notice to that effect in the pypissh README (IIRC, MvL is the creator of that upload option). Cheers, Nick.
Richard
* as the guy who will be hassled if its loss is noticed ;)
On 26 January 2014 10:38, Donald Stufft <donald@stufft.io> wrote:
Today (Sat Jan 25, 2014) the Infrastructure team has migrated PyPI to new infrastructure.
The old infrastructure was:
- a single database server managed by OSUOSL - a pair of load balancers shared by all of the python.org services
OSUOSL - a single backend VM that served as everything else for PyPI
The VM that was acting as the backend server from PyPI was partially hand configured and partially setup to be managed by chef. Additionally it had an issue that caused it to kernel panic every so often which had been the cause of a number of downtimes in the last few months. Because it was primarily configured and administered by hand and because the way it was set up it was not feasible to have any sort of failover or spare.
The new infrastructure is:
- 2 Web VMs - 2 Database servers in Master/Slave Configuration - 2 PgPool Servers pooling connections to the database servers and load balancing reads across them. - 2 GlusterFS servers backed by Cloud Block Storage acting as the file storage for package and package docs - 1 metrics server to handle updating the download counts as they come in from Fastly
All of the VMs are hosted on Rackspace’s Public Cloud and have their configuration completely controlled and managed using Salt. Going forward this will allow us to easily scale out as required or kill malfunctioning servers and spin up new ones easily. Additionally the setup has been setup so
where possible there is two servers performing the same role, ideally in an Active/Active configuration but at least in a Master/Slave configuration. This should allow PyPI to be far more stable moving forward and make downtimes much easier to recover from.
The services are still fronted by Fasty’s CDN and in the new infrastructure we’ve removed our load balancer and have replaced it by having Fastly handle the load balancing for us. Additionally we’ve recently setup a static mirror of PyPI that is updated once every minute. This is hosted on Rackspace cloud as well but in a separate data center from the rest of PyPI. Fastly is configured to fall back to this static mirror in the case that neither of the two web heads are functioning. This should ensure that even in the event of a catastrophic failure of the PyPI service that the bulk of package installations should hopefully remain working.
The bad news, (and the “Breakage” from the subject) is that while the new infrastructure was being planned out, built, and migrated to the “pypissh” package was forgotten. The pypissh package is an alternative way to upload packages to PyPI however it is very difficult, because of the way it works, to provide HA support for it as we’ve set up for everything else. We don’t have any numbers for how many people are actively using this package but looking at a roughly 2 week chunk of time in PyPI’s download history, the pypissh package was downloaded 7 times by a browser, and 7 times by pip. All other downloads were caused by the mirroring system.
As of right now pypissh is non functional and due to the difficulty in HAing and monitoring the current setup and because it is apparently has a very small set of users we would like to effectively kill off this particular service. Additionally the benefits of pypissh have been reduced now that PyPI is available over a TLS connection with a well trusted certificate. My question to you is, is this something that distutils-sig is willing to have happen? If we are to re-enable pypissh we’ll need to write a new solution to doing it that can be properly HA’d and we’d prefer to put our efforts into improving
hosted on that things
for a much larger set of people.
So yea, PyPI should be loads more stable and more reliable now.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
I'll get in touch with Martin. On 26 January 2014 11:40, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 26 Jan 2014 09:51, "Richard Jones" <richard@python.org> wrote:
Thanks everyone who helped make this happen.
Indeed - fine work! :)
From my perspective* I believe the ssh upload mechanism was added to address security issues around the basic-auth-over-http method used historically. Now uploads *may* be done over https, and those using the ssh method can move over to using twine or pip upload, I think that it's reasonable to discontinue support for ssh uploads.
Yes, I agree that pointing the (very few) pypissh users towards twine as a replacement is the most reasonable option at this point - we should chat to MvL about putting a notice to that effect in the pypissh README (IIRC, MvL is the creator of that upload option).
Cheers, Nick.
Richard
* as the guy who will be hassled if its loss is noticed ;)
On 26 January 2014 10:38, Donald Stufft <donald@stufft.io> wrote:
Today (Sat Jan 25, 2014) the Infrastructure team has migrated PyPI to
infrastructure.
The old infrastructure was:
- a single database server managed by OSUOSL - a pair of load balancers shared by all of the python.org services hosted on OSUOSL - a single backend VM that served as everything else for PyPI
The VM that was acting as the backend server from PyPI was partially hand configured and partially setup to be managed by chef. Additionally it had an issue that caused it to kernel panic every so often which had been the cause of a number of downtimes in the last few months. Because it was primarily configured and administered by hand and because the way it was set up it was not feasible to have any sort of failover or spare.
The new infrastructure is:
- 2 Web VMs - 2 Database servers in Master/Slave Configuration - 2 PgPool Servers pooling connections to the database servers and load balancing reads across them. - 2 GlusterFS servers backed by Cloud Block Storage acting as the file storage for package and package docs - 1 metrics server to handle updating the download counts as they come in from Fastly
All of the VMs are hosted on Rackspace’s Public Cloud and have their configuration completely controlled and managed using Salt. Going forward this will allow us to easily scale out as required or kill malfunctioning servers and spin up new ones easily. Additionally the setup has been setup so
where possible there is two servers performing the same role, ideally in an Active/Active configuration but at least in a Master/Slave configuration. This should allow PyPI to be far more stable moving forward and make downtimes much easier to recover from.
The services are still fronted by Fasty’s CDN and in the new infrastructure we’ve removed our load balancer and have replaced it by having Fastly handle the load balancing for us. Additionally we’ve recently setup a static mirror of PyPI that is updated once every minute. This is hosted on Rackspace cloud as well but in a separate data center from the rest of PyPI. Fastly is configured to fall back to this static mirror in the case that neither of the two web heads are functioning. This should ensure that even in the event of a catastrophic failure of the PyPI service that the bulk of package installations should hopefully remain working.
The bad news, (and the “Breakage” from the subject) is that while the new infrastructure was being planned out, built, and migrated to the “pypissh” package was forgotten. The pypissh package is an alternative way to upload packages to PyPI however it is very difficult, because of the way it works, to provide HA support for it as we’ve set up for everything else. We don’t have any numbers for how many people are actively using this package but looking at a roughly 2 week chunk of time in PyPI’s download history, the
package was downloaded 7 times by a browser, and 7 times by pip. All other downloads were caused by the mirroring system.
As of right now pypissh is non functional and due to the difficulty in HAing and monitoring the current setup and because it is apparently has a very small set of users we would like to effectively kill off this particular service. Additionally the benefits of pypissh have been reduced now
is available over a TLS connection with a well trusted certificate. My question to you is, is this something that distutils-sig is willing to have happen? If we are to re-enable pypissh we’ll need to write a new solution to doing it that can be properly HA’d and we’d prefer to put our efforts into improving
new that pypissh that PyPI things
for a much larger set of people.
So yea, PyPI should be loads more stable and more reliable now.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Quoting Richard Jones <richard@python.org>:
Thanks everyone who helped make this happen.
From my perspective* I believe the ssh upload mechanism was added to address security issues around the basic-auth-over-http method used historically. Now uploads *may* be done over https, and those using the ssh method can move over to using twine or pip upload, I think that it's reasonable to discontinue support for ssh uploads.
There is one usecase that still isn't addressed by any of the alternatives: Automated uploads still require the password to be stored on disk. So if the laptop is stolen, the password may get stolen as well. With SSH upload, the authentication comes from the ssh-agent, which protects the credentials better (i.e. if the laptop is powered-down, or requires the user to enter a password on access, the key is protected). It has been suggested to resolve this using the keyring library (which would give the same protection to the password as ssh-agent to the private key), but a) I don't think it actually *has* been implemented, and b) to properly implement it (i.e. without monkey-patching register/upload), it would have to be done in CPython, and c) that would require to put keyring into CPython, which could happen in Python 3.5 at the earliest. So I suggest that somebody does a), and then provides a package that works around b) and c) by monkeypatching distutils (just like pypissh does). In any case, if you really chose to discontinue SSH access, I suggest that you also change the UI to drop registration of SSH keys, and then ultimately remove them from the schema. BTW, you can get an indication of how many users this might affect by checking how many users have keys registered. Regards, Martin
Hello, Le 26/01/2014 06:03, martin@v.loewis.de a écrit :
There is one usecase that still isn't addressed by any of the alternatives: Automated uploads still require the password to be stored on disk. So if the laptop is stolen, the password may get stolen as well.
With SSH upload, the authentication comes from the ssh-agent, which protects the credentials better (i.e. if the laptop is powered-down, or requires the user to enter a password on access, the key is protected).
It has been suggested to resolve this using the keyring library (which would give the same protection to the password as ssh-agent to the private key) [...]
distutils can’t depend on keyring, but twine could. Regards
On Sat, Jan 25, 2014 at 3:38 PM, Donald Stufft <donald@stufft.io> wrote:
Today (Sat Jan 25, 2014) the Infrastructure team has migrated PyPI to new infrastructure.
The old infrastructure was:
- a single database server managed by OSUOSL - a pair of load balancers shared by all of the python.org services hosted on OSUOSL - a single backend VM that served as everything else for PyPI
The VM that was acting as the backend server from PyPI was partially hand configured and partially setup to be managed by chef. Additionally it had an issue that caused it to kernel panic every so often which had been the cause of a number of downtimes in the last few months. Because it was primarily configured and administered by hand and because the way it was set up it was not feasible to have any sort of failover or spare.
The new infrastructure is:
- 2 Web VMs - 2 Database servers in Master/Slave Configuration - 2 PgPool Servers pooling connections to the database servers and load balancing reads across them. - 2 GlusterFS servers backed by Cloud Block Storage acting as the file storage for package and package docs - 1 metrics server to handle updating the download counts as they come in from Fastly
All of the VMs are hosted on Rackspace’s Public Cloud and have their configuration completely controlled and managed using Salt. Going forward this
Can you say a little about the choice to use Salt instead of Chef? I don't really care either way, but am just curious. Is it because Salt is written in Python, or were there other reasons (functionality, etc)? --Chris
will allow us to easily scale out as required or kill malfunctioning servers and spin up new ones easily. Additionally the setup has been setup so that where possible there is two servers performing the same role, ideally in an Active/Active configuration but at least in a Master/Slave configuration. This should allow PyPI to be far more stable moving forward and make downtimes much easier to recover from.
The services are still fronted by Fasty’s CDN and in the new infrastructure we’ve removed our load balancer and have replaced it by having Fastly handle the load balancing for us. Additionally we’ve recently setup a static mirror of PyPI that is updated once every minute. This is hosted on Rackspace cloud as well but in a separate data center from the rest of PyPI. Fastly is configured to fall back to this static mirror in the case that neither of the two web heads are functioning. This should ensure that even in the event of a catastrophic failure of the PyPI service that the bulk of package installations should hopefully remain working.
The bad news, (and the “Breakage” from the subject) is that while the new infrastructure was being planned out, built, and migrated to the “pypissh” package was forgotten. The pypissh package is an alternative way to upload packages to PyPI however it is very difficult, because of the way it works, to provide HA support for it as we’ve set up for everything else. We don’t have any numbers for how many people are actively using this package but looking at a roughly 2 week chunk of time in PyPI’s download history, the pypissh package was downloaded 7 times by a browser, and 7 times by pip. All other downloads were caused by the mirroring system.
As of right now pypissh is non functional and due to the difficulty in HAing and monitoring the current setup and because it is apparently has a very small set of users we would like to effectively kill off this particular service. Additionally the benefits of pypissh have been reduced now that PyPI is available over a TLS connection with a well trusted certificate. My question to you is, is this something that distutils-sig is willing to have happen? If we are to re-enable pypissh we’ll need to write a new solution to doing it that can be properly HA’d and we’d prefer to put our efforts into improving things for a much larger set of people.
So yea, PyPI should be loads more stable and more reliable now.
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Jan 25, 2014, at 7:04 PM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Sat, Jan 25, 2014 at 3:38 PM, Donald Stufft <donald@stufft.io> wrote:
Today (Sat Jan 25, 2014) the Infrastructure team has migrated PyPI to new infrastructure.
The old infrastructure was:
- a single database server managed by OSUOSL - a pair of load balancers shared by all of the python.org services hosted on OSUOSL - a single backend VM that served as everything else for PyPI
The VM that was acting as the backend server from PyPI was partially hand configured and partially setup to be managed by chef. Additionally it had an issue that caused it to kernel panic every so often which had been the cause of a number of downtimes in the last few months. Because it was primarily configured and administered by hand and because the way it was set up it was not feasible to have any sort of failover or spare.
The new infrastructure is:
- 2 Web VMs - 2 Database servers in Master/Slave Configuration - 2 PgPool Servers pooling connections to the database servers and load balancing reads across them. - 2 GlusterFS servers backed by Cloud Block Storage acting as the file storage for package and package docs - 1 metrics server to handle updating the download counts as they come in from Fastly
All of the VMs are hosted on Rackspace’s Public Cloud and have their configuration completely controlled and managed using Salt. Going forward this
Can you say a little about the choice to use Salt instead of Chef? I don't really care either way, but am just curious. Is it because Salt is written in Python, or were there other reasons (functionality, etc)?
--Chris
I’d need to ask Ernest to be sure, but I believe it was mostly that he was more familiar with it. The fact that it was written in Python was a bonus as well ;) I don’t think that there was anything that Chef was missing or that Salt had over Chef, just familiarity of the person who did most of the work. I’ll double check with Ernest to make sure there wasn’t another reason :) ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Congrats! Thanks for always making the PyPI infrastructure better and better. Where are the states stored? On Sat, Jan 25, 2014 at 5:18 PM, Donald Stufft <donald@stufft.io> wrote:
On Jan 25, 2014, at 7:04 PM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Sat, Jan 25, 2014 at 3:38 PM, Donald Stufft <donald@stufft.io> wrote:
Today (Sat Jan 25, 2014) the Infrastructure team has migrated PyPI to new infrastructure.
The old infrastructure was:
- a single database server managed by OSUOSL - a pair of load balancers shared by all of the python.org services hosted on OSUOSL - a single backend VM that served as everything else for PyPI
The VM that was acting as the backend server from PyPI was partially hand configured and partially setup to be managed by chef. Additionally it had an issue that caused it to kernel panic every so often which had been the cause of a number of downtimes in the last few months. Because it was primarily configured and administered by hand and because the way it was set up it was not feasible to have any sort of failover or spare.
The new infrastructure is:
- 2 Web VMs - 2 Database servers in Master/Slave Configuration - 2 PgPool Servers pooling connections to the database servers and load balancing reads across them. - 2 GlusterFS servers backed by Cloud Block Storage acting as the file storage for package and package docs - 1 metrics server to handle updating the download counts as they come in from Fastly
All of the VMs are hosted on Rackspace’s Public Cloud and have their configuration completely controlled and managed using Salt. Going forward this
Can you say a little about the choice to use Salt instead of Chef? I don't really care either way, but am just curious. Is it because Salt is written in Python, or were there other reasons (functionality, etc)?
--Chris
I’d need to ask Ernest to be sure, but I believe it was mostly that he was more familiar with it. The fact that it was written in Python was a bonus as well ;) I don’t think that there was anything that Chef was missing or that Salt had over Chef, just familiarity of the person who did most of the work. I’ll double check with Ernest to make sure there wasn’t another reason :)
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
s/states/salt states/ On Sat, Jan 25, 2014 at 9:15 PM, Kyle Kelley <rgbkrk@gmail.com> wrote:
Congrats! Thanks for always making the PyPI infrastructure better and better.
Where are the states stored?
On Sat, Jan 25, 2014 at 5:18 PM, Donald Stufft <donald@stufft.io> wrote:
On Jan 25, 2014, at 7:04 PM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Sat, Jan 25, 2014 at 3:38 PM, Donald Stufft <donald@stufft.io> wrote:
Today (Sat Jan 25, 2014) the Infrastructure team has migrated PyPI to new infrastructure.
The old infrastructure was:
- a single database server managed by OSUOSL - a pair of load balancers shared by all of the python.org services hosted on OSUOSL - a single backend VM that served as everything else for PyPI
The VM that was acting as the backend server from PyPI was partially hand configured and partially setup to be managed by chef. Additionally it had an issue that caused it to kernel panic every so often which had been the cause of a number of downtimes in the last few months. Because it was primarily configured and administered by hand and because the way it was set up it was not feasible to have any sort of failover or spare.
The new infrastructure is:
- 2 Web VMs - 2 Database servers in Master/Slave Configuration - 2 PgPool Servers pooling connections to the database servers and load balancing reads across them. - 2 GlusterFS servers backed by Cloud Block Storage acting as the file storage for package and package docs - 1 metrics server to handle updating the download counts as they come in from Fastly
All of the VMs are hosted on Rackspace’s Public Cloud and have their configuration completely controlled and managed using Salt. Going forward this
Can you say a little about the choice to use Salt instead of Chef? I don't really care either way, but am just curious. Is it because Salt is written in Python, or were there other reasons (functionality, etc)?
--Chris
I’d need to ask Ernest to be sure, but I believe it was mostly that he was more familiar with it. The fact that it was written in Python was a bonus as well ;) I don’t think that there was anything that Chef was missing or that Salt had over Chef, just familiarity of the person who did most of the work. I’ll double check with Ernest to make sure there wasn’t another reason :)
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
On Jan 25, 2014, at 11:15 PM, Kyle Kelley <rgbkrk@gmail.com> wrote:
Where are the states stored?
https://github.com/python/pypi-salt ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 1/25/14, 6:38 PM, Donald Stufft wrote:
My question to you is, is this something that distutils-sig is willing to have happen? If we are to re-enable pypissh we’ll need to write a new solution to doing it that can be properly HA’d and we’d prefer to put our efforts into improving things for a much larger set of people.
+0 re: pypissh, but I am still interested in seeing OAuth support come back: - https://bitbucket.org/pypa/pypi/issue/85/oauth-authorise-not-found-https-mus... Any idea if this can come back as part of PyPI or if we have to wait for warehouse? Nice work on the infrastructure, thank you! -- Alex Clark · http://about.me/alex.clark
It definitely looks like we've got some issues introduced in recent server migrations and reconfigurations. Things I'm aware of: - OAuth is busted - OpenID is confused and/or busted - password reset is possibly busted - pypissh is busted Richard On 27 January 2014 00:26, Alex Clark <aclark@aclark.net> wrote:
On 1/25/14, 6:38 PM, Donald Stufft wrote:
My question to you is, is this something that distutils-sig is willing to have happen? If we are to re-enable pypissh we’ll need to write a new solution to doing it that can be properly HA’d and we’d prefer to put our efforts into improving things for a much larger set of people.
+0 re: pypissh, but I am still interested in seeing OAuth support come back:
- https://bitbucket.org/pypa/pypi/issue/85/oauth-authorise- not-found-https-must-be
Any idea if this can come back as part of PyPI or if we have to wait for warehouse? Nice work on the infrastructure, thank you!
-- Alex Clark · http://about.me/alex.clark
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
Just a follow up. On Jan 26, 2014, at 4:40 PM, Richard Jones <richard@python.org> wrote:
It definitely looks like we've got some issues introduced in recent server migrations and reconfigurations. Things I'm aware of:
- OAuth is busted - OpenID is confused and/or busted
These two issues existed prior to the migration as far as I can tell.
- password reset is possibly busted
This has been fixed.
- pypissh is busted
This is still the case and was known from the first announcement. Additionally it seems there is an issue with the health checks when connecting from Australia that cause issues where folks will intermittently get the static page instead of the real PyPI.
Richard
On 27 January 2014 00:26, Alex Clark <aclark@aclark.net> wrote: On 1/25/14, 6:38 PM, Donald Stufft wrote: My question to you is, is this something that distutils-sig is willing to have happen? If we are to re-enable pypissh we’ll need to write a new solution to doing it that can be properly HA’d and we’d prefer to put our efforts into improving things for a much larger set of people.
+0 re: pypissh, but I am still interested in seeing OAuth support come back:
- https://bitbucket.org/pypa/pypi/issue/85/oauth-authorise-not-found-https-mus...
Any idea if this can come back as part of PyPI or if we have to wait for warehouse? Nice work on the infrastructure, thank you!
-- Alex Clark · http://about.me/alex.clark
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Donald Stufft <donald <at> stufft.io> writes:
Just a follow up. - OAuth is busted
These two issues existed prior to the migration as far as I can tell.
Correct. We've discussed Oauth in IRC and this ticket has existed since late last year: - https://bitbucket.org/pypa/pypi/issue/85/oauth-authorise-not-found-https-mus... I'm bringing it up now because I'm still interested in seeing it fixed. IIUC MvL correctly, it happened around the time of the CDN switch. In any event, there is a portion of traffic going to/from PyPI unencrypted and PyPI needs it to be encrypted. This leads to the confusing error message when trying to do OAuth over "https". You talk https to the end point, and the end point (seemingly) responds "I need this to be https".
On Jan 27, 2014, at 7:28 AM, Alex Clark <aclark@aclark.net> wrote:
Donald Stufft <donald <at> stufft.io> writes:
Just a follow up. - OAuth is busted
These two issues existed prior to the migration as far as I can tell.
Correct. We've discussed Oauth in IRC and this ticket has existed since late last year:
- https://bitbucket.org/pypa/pypi/issue/85/oauth-authorise-not-found-https-mus...
I'm bringing it up now because I'm still interested in seeing it fixed. IIUC MvL correctly, it happened around the time of the CDN switch.
In any event, there is a portion of traffic going to/from PyPI unencrypted and PyPI needs it to be encrypted. This leads to the confusing error message when trying to do OAuth over "https". You talk https to the end point, and the end point (seemingly) responds "I need this to be https”.
It’s very unlikely for something to happen over not HTTPS now. The backend servers for PyPI do not offer a non HTTPS port, and Fastly has a blanket HTTP -> HTTPS redirect. Most likely the issue is just that PyPI isn’t realizing that it’s being accessed via HTTPS.
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Donald Stufft <donald <at> stufft.io> writes:
Most likely the issue is just that PyPI isn’t realizing that it’s being accessed via HTTPS.
Can you say more about this? If that's the case, then it sounds like someone can send you a pull request to remove the (bogus?) https check and we are done.
On Jan 27, 2014, at 7:39 AM, Alex Clark <aclark@aclark.net> wrote:
Donald Stufft <donald <at> stufft.io> writes:
Most likely the issue is just that PyPI isn’t realizing that it’s being accessed via HTTPS.
Can you say more about this? If that's the case, then it sounds like someone can send you a pull request to remove the (bogus?) https check and we are done.
I haven’t looked into it yet simply out of a function of time. Obviously PyPI is checking if it’s being accessed via HTTPS somehow, and obviously (due to the nature of the error) it doesn’t believe it is being accessed via HTTPS. Since I know that it shouldn’t be possible to access PyPI via non HTTPS I can only deduce that however it’s determining that it’s running behind HTTPS isn’t working for one reason or another. I don’t really know more than that at the moment.
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
I haven’t looked into it yet simply out of a function of time. Obviously PyPI is checking if it’s being accessed via HTTPS somehow, and obviously (due to the nature of the error) it doesn’t believe it is being accessed via HTTPS. Since I know that it shouldn’t be possible to access PyPI via non HTTPS I can only deduce that however it’s determining that it’s running behind HTTPS isn’t working for one reason or another.
I don’t really know more than that at the moment.
See the line in the ticket. https://bitbucket.org/pypa/pypi/issue/85/oauth-authorise-not-found-https-mus... https://bitbucket.org/pypa/pypi/src/099a6bb6e4f23f61d2dc2117d36f86fd3dfd57e2... The "HTTPS" environment variable is not set, which ought to have the value "on" if access came through https. The issue is probably this: http://lists.unbit.it/pipermail/uwsgi/2010-August/000561.html So a line uwsgi_param HTTPS on; in https://github.com/python/pypi-salt/blob/master/provisioning/salt/roots/salt... should help, or alternatively a conditional line in pypi's app.conf, condition on the scheme being https. On dinsdale, the old nginx configuration had the line uwsgi_param HTTPS $https if_not_empty; so I'm pretty sure it worked when I moved the service to OSL. Regards, Martin
participants (9)
-
"Martin v. Löwis"
-
Alex Clark
-
Chris Jerdonek
-
Donald Stufft
-
Kyle Kelley
-
martin@v.loewis.de
-
Nick Coghlan
-
Richard Jones
-
Éric Araujo