Since the launch of the new infrastructure for PyPI two weeks ago, I’ve been monitoring overall performance and reliability of PyPI for browsers, uploads, installers, and mirrors. Overall I am very happy, but have noticed an ongoing issue with latency spikes and 5xx errors. I believe these issues are not new, but we don’t have any of the logs or monitoring that came along with the new infrastructure. The cause of these issues is very apparently mirroring clients hitting PyPI with floods of requests at common cron intervals. Additionally, new mirrors coming online and performing their initial sync can easily cause extended periods of increased latency and errors for all users, especially if the number of workers configured to perform the sync is turned up. At 2014-02-07 at about 00:00 UTC PyPI was effectively DoS’d for 45 minutes while a major research lab performing a sync via bandersnatch. It appears their worker count may have been configured as high as 50. The design of PEP 381 mirroring clients requires calls to the PyPI XMLRPC to obtain changelogs and package serial numbers. As such, when clients are configured for high parallelism our backends can be quickly overwhelmed. In order to maintain quality of service for all clients, we will begin rate limiting requests to the following routes: - /pypi - /mirrors - /id - /oauth - /security The initial rates will be limited to 5 req/s per IP with bursts of 10 requests allowed. Client requests up to the burst limit will be delayed to maintain a 5 req/s maximum. Any requests past the 10 request burst will receive an HTTP 429 response code per RFC 6585. Tuning these parameters will be painless, so if issues arise with mirroring clients we will be very responsive to necessary modifications. Note that the routes used by installation clients (`/packages` and `/simple`) will remain unaffected as they are generally served from the CDN, and do not have as high of an overhead in our backend processes. This rate-limiting is to be considered an interim solution, as I plan to begin a discussion on some updates to mirroring infrastructure guidelines.
On 9 February 2014 11:15, Ernest W. Durbin III <ewdurbin@gmail.com> wrote:
Since the launch of the new infrastructure for PyPI two weeks ago, I've been monitoring overall performance and reliability of PyPI for browsers, uploads, installers, and mirrors. The initial rates will be limited to 5 req/s per IP with bursts of 10 requests allowed. Client requests up to the burst limit will be delayed to maintain a 5 req/s maximum. Any requests past the 10 request burst will receive an HTTP 429 response code per RFC 6585.
5/s sounds really low - if the RPC's take less than 200ms to answer (and I sure hope they do), a single threaded mirroring client (with low latency to PyPI's servers // pipelined requests) can easily it. Most folk I know writing API servers aim for response times in the single to low 10's of ms digits... What is the 95% percentile for PyPI to answer these problematic APIs ? Can our infrastructure restrict concurrency etc (e.g. if we have haproxy it can trivially limit by concurrency rather than rate)? That would be IMO a better metric for overload.
Tuning these parameters will be painless, so if issues arise with mirroring clients we will be very responsive to necessary modifications.
Note that the routes used by installation clients (`/packages` and `/simple`) will remain unaffected as they are generally served from the CDN, and do not have as high of an overhead in our backend processes.
This rate-limiting is to be considered an interim solution, as I plan to begin a discussion on some updates to mirroring infrastructure guidelines.
Ok, cool. -Rob -- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Converged Cloud
On Feb 8, 2014, at 6:25 PM, Robert Collins <robertc@robertcollins.net> wrote:
On 9 February 2014 11:15, Ernest W. Durbin III <ewdurbin@gmail.com> wrote:
Since the launch of the new infrastructure for PyPI two weeks ago, I've been monitoring overall performance and reliability of PyPI for browsers, uploads, installers, and mirrors. The initial rates will be limited to 5 req/s per IP with bursts of 10 requests allowed. Client requests up to the burst limit will be delayed to maintain a 5 req/s maximum. Any requests past the 10 request burst will receive an HTTP 429 response code per RFC 6585.
5/s sounds really low - if the RPC's take less than 200ms to answer (and I sure hope they do), a single threaded mirroring client (with low latency to PyPI's servers // pipelined requests) can easily it. Most folk I know writing API servers aim for response times in the single to low 10's of ms digits... What is the 95% percentile for PyPI to answer these problematic APIs ?
If you are making lots of sequential requests, you should be putting a sleep in there. "as fast as possible" isn't a design goal, it is good service for all clients. --Noah
On 9 February 2014 19:28, Noah Kantrowitz <noah@coderanger.net> wrote:
On Feb 8, 2014, at 6:25 PM, Robert Collins <robertc@robertcollins.net> wrote:
5/s sounds really low - if the RPC's take less than 200ms to answer (and I sure hope they do), a single threaded mirroring client (with low latency to PyPI's servers // pipelined requests) can easily it. Most folk I know writing API servers aim for response times in the single to low 10's of ms digits... What is the 95% percentile for PyPI to answer these problematic APIs ?
If you are making lots of sequential requests, you should be putting a sleep in there. "as fast as possible" isn't a design goal, it is good service for all clients.
As fast as possible (on the server side) and good service for all clients are very tightly correlated (and some would say there is a causative relationship in fact). On the client side, I totally support limiting concurrency, but I've yet to see a convincing explanation for rate limiting already serialised requests that doesn't boil down to 'assume the server is badly written'. Note - I'm not assuming - or implying - that about PyPI. -Rob -- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Converged Cloud
On Feb 9, 2014, at 4:13 AM, Robert Collins <robertc@robertcollins.net> wrote:
On 9 February 2014 19:28, Noah Kantrowitz <noah@coderanger.net> wrote:
On Feb 8, 2014, at 6:25 PM, Robert Collins <robertc@robertcollins.net> wrote:
5/s sounds really low - if the RPC's take less than 200ms to answer (and I sure hope they do), a single threaded mirroring client (with low latency to PyPI's servers // pipelined requests) can easily it. Most folk I know writing API servers aim for response times in the single to low 10's of ms digits... What is the 95% percentile for PyPI to answer these problematic APIs ?
If you are making lots of sequential requests, you should be putting a sleep in there. "as fast as possible" isn't a design goal, it is good service for all clients.
As fast as possible (on the server side) and good service for all clients are very tightly correlated (and some would say there is a causative relationship in fact).
On the client side, I totally support limiting concurrency, but I've yet to see a convincing explanation for rate limiting already serialised requests that doesn't boil down to 'assume the server is badly written'. Note - I'm not assuming - or implying - that about PyPI.
-Rob
-- Robert Collins <rbtcollins@hp.com> Distinguished Technologist HP Converged Cloud _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig
PyPI is a terrible code base that *is* pretty bad. There is an effort underway to make this better, but right now assuming that is a pretty safe bet :) It’s a really old code case (12 years or so) and has never had anyone on it full time so over the years it has accumulated lots of cruft and it comes from a time when web development practices were not very good. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Feb 9, 2014, at 1:13 AM, Robert Collins <robertc@robertcollins.net> wrote:
On 9 February 2014 19:28, Noah Kantrowitz <noah@coderanger.net> wrote:
On Feb 8, 2014, at 6:25 PM, Robert Collins <robertc@robertcollins.net> wrote:
5/s sounds really low - if the RPC's take less than 200ms to answer (and I sure hope they do), a single threaded mirroring client (with low latency to PyPI's servers // pipelined requests) can easily it. Most folk I know writing API servers aim for response times in the single to low 10's of ms digits... What is the 95% percentile for PyPI to answer these problematic APIs ?
If you are making lots of sequential requests, you should be putting a sleep in there. "as fast as possible" isn't a design goal, it is good service for all clients.
As fast as possible (on the server side) and good service for all clients are very tightly correlated (and some would say there is a causative relationship in fact).
On the client side, I totally support limiting concurrency, but I've yet to see a convincing explanation for rate limiting already serialised requests that doesn't boil down to 'assume the server is badly written'. Note - I'm not assuming - or implying - that about PyPI.
I'm not sure what point you are trying to make. The server wouldn't artificially slow down requests, it (well, nginx) would just track requests and send 503s if limits are exceeded. Requests still complete as fast as possible, and we can ensure one client doesn't hog all the server resources. --Noah
On Sun, Feb 9, 2014 at 12:16 PM, Noah Kantrowitz <noah@coderanger.net> wrote:
On Feb 9, 2014, at 1:13 AM, Robert Collins <robertc@robertcollins.net> wrote:
On 9 February 2014 19:28, Noah Kantrowitz <noah@coderanger.net> wrote:
On Feb 8, 2014, at 6:25 PM, Robert Collins <robertc@robertcollins.net> wrote:
5/s sounds really low - if the RPC's take less than 200ms to answer (and I sure hope they do), a single threaded mirroring client (with low latency to PyPI's servers // pipelined requests) can easily it. Most folk I know writing API servers aim for response times in the single to low 10's of ms digits... What is the 95% percentile for PyPI to answer these problematic APIs ?
If you are making lots of sequential requests, you should be putting a sleep in there. "as fast as possible" isn't a design goal, it is good service for all clients.
As fast as possible (on the server side) and good service for all clients are very tightly correlated (and some would say there is a causative relationship in fact).
On the client side, I totally support limiting concurrency, but I've yet to see a convincing explanation for rate limiting already serialised requests that doesn't boil down to 'assume the server is badly written'. Note - I'm not assuming - or implying - that about PyPI.
I'm not sure what point you are trying to make. The server wouldn't artificially slow down requests, it (well, nginx) would just track requests and send 503s if limits are exceeded. Requests still complete as fast as possible, and we can ensure one client doesn't hog all the server resources.
I think he's saying that, given that the problems are being caused by "clients configured for high parallelism," why not choose a rate-limiting method that won't impact clients accessing it in a single-threaded fashion. It's a reasonable question. Also, if the server isn't artificially slowing down requests, what does, "Client requests up to the burst limit [of 10 requests] will be delayed to maintain a 5 req/s maximum" mean? --Chris
On Feb 10, 2014, at 1:48 AM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
On Sun, Feb 9, 2014 at 12:16 PM, Noah Kantrowitz <noah@coderanger.net> wrote:
On Feb 9, 2014, at 1:13 AM, Robert Collins <robertc@robertcollins.net> wrote:
On 9 February 2014 19:28, Noah Kantrowitz <noah@coderanger.net> wrote:
On Feb 8, 2014, at 6:25 PM, Robert Collins <robertc@robertcollins.net> wrote:
5/s sounds really low - if the RPC's take less than 200ms to answer (and I sure hope they do), a single threaded mirroring client (with low latency to PyPI's servers // pipelined requests) can easily it. Most folk I know writing API servers aim for response times in the single to low 10's of ms digits... What is the 95% percentile for PyPI to answer these problematic APIs ?
If you are making lots of sequential requests, you should be putting a sleep in there. "as fast as possible" isn't a design goal, it is good service for all clients.
As fast as possible (on the server side) and good service for all clients are very tightly correlated (and some would say there is a causative relationship in fact).
On the client side, I totally support limiting concurrency, but I've yet to see a convincing explanation for rate limiting already serialised requests that doesn't boil down to 'assume the server is badly written'. Note - I'm not assuming - or implying - that about PyPI.
I'm not sure what point you are trying to make. The server wouldn't artificially slow down requests, it (well, nginx) would just track requests and send 503s if limits are exceeded. Requests still complete as fast as possible, and we can ensure one client doesn't hog all the server resources.
I think he's saying that, given that the problems are being caused by "clients configured for high parallelism," why not choose a rate-limiting method that won't impact clients accessing it in a single-threaded fashion. It's a reasonable question.
Also, if the server isn't artificially slowing down requests, what does, "Client requests up to the burst limit [of 10 requests] will be delayed to maintain a 5 req/s maximum" mean?
Any requests beyond the rate limits will get an HTTP 503 with an empty body. --Noah
On Sat, Feb 08, 2014 at 05:15:21PM -0500, Ernest W. Durbin III wrote:
The initial rates will be limited to 5 req/s per IP with bursts of 10 requests allowed. Client requests up to the burst limit will be delayed to maintain a 5 req/s maximum. Any requests past the 10 request burst will receive an HTTP 429 response code per RFC 6585.
On Mon, Feb 10, 2014 at 01:49:53AM -0800, Noah Kantrowitz wrote:
On Feb 10, 2014, at 1:48 AM, Chris Jerdonek <chris.jerdonek@gmail.com> wrote:
Also, if the server isn't artificially slowing down requests, what does, "Client requests up to the burst limit [of 10 requests] will be delayed to maintain a 5 req/s maximum" mean?
Any requests beyond the rate limits will get an HTTP 503 with an empty body.
So is it 429 or 503? And is there a delay or not? Marius Gedminas -- Linux is a fast moving project, with very fast evolving components. If you're using an older distribution, older than 4 to 6 months (and anything with "Enterprise" in the name is by definition old), please consider going to a newer distribution. -- http://www.linuxpowertop.org/known.php
participants (6)
-
Chris Jerdonek -
Donald Stufft -
Ernest W. Durbin III -
Marius Gedminas -
Noah Kantrowitz -
Robert Collins