[Distutils] PyPI Rate Limiting

Mon Feb 10 10:48:44 CET 2014

On Sun, Feb 9, 2014 at 12:16 PM, Noah Kantrowitz <noah at coderanger.net> wrote:
>
> On Feb 9, 2014, at 1:13 AM, Robert Collins <robertc at robertcollins.net> wrote:
>
>> On 9 February 2014 19:28, Noah Kantrowitz <noah at coderanger.net> wrote:
>>>
>>> On Feb 8, 2014, at 6:25 PM, Robert Collins <robertc at robertcollins.net> wrote:
>>>
>>
>>>> 5/s sounds really low - if the RPC's take less than 200ms to answer
>>>> (and I sure hope they do), a single threaded mirroring client (with
>>>> low latency to PyPI's servers // pipelined requests) can easily it.
>>>> Most folk I know writing API servers aim for response times in the
>>>> single to low 10's of ms digits... What is the 95% percentile for PyPI
>>>> to answer these problematic APIs ?
>>>>
>>>
>>> If you are making lots of sequential requests, you should be putting a sleep in there. "as fast as possible" isn't a design goal, it is good service for all clients.
>>
>> As fast as possible (on the server side) and good service for all
>> clients are very tightly correlated (and some would say there is a
>> causative relationship in fact).
>>
>> On the client side, I totally support limiting concurrency, but I've
>> yet to see a convincing explanation for rate limiting already
>> serialised requests that doesn't boil down to 'assume the server is
>> badly written'. Note - I'm not assuming - or implying - that about
>> PyPI.
>
> I'm not sure what point you are trying to make. The server wouldn't artificially slow down requests, it (well, nginx) would just track requests and send 503s if limits are exceeded. Requests still complete as fast as possible, and we can ensure one client doesn't hog all the server resources.

I think he's saying that, given that the problems are being caused by
"clients configured for high parallelism," why not choose a
rate-limiting method that won't impact clients accessing it in a
single-threaded fashion.  It's a reasonable question.

Also, if the server isn't artificially slowing down requests, what
does, "Client requests up to the burst limit [of 10 requests] will be
delayed to maintain a 5 req/s maximum" mean?

--Chris