[Distutils] PyPI Rate Limiting

Ernest W. Durbin III ewdurbin at gmail.com
Sat Feb 8 23:15:21 CET 2014

Since the launch of the new infrastructure for PyPI two weeks ago, I’ve been monitoring overall performance and reliability of PyPI for browsers, uploads, installers, and mirrors.

Overall I am very happy, but have noticed an ongoing issue with latency spikes and 5xx errors. I believe these issues are not new, but we don’t have any of the logs or monitoring that came along with the new infrastructure.

The cause of these issues is very apparently mirroring clients hitting PyPI with floods of requests at common cron intervals. Additionally, new mirrors coming online and performing their initial sync can easily cause extended periods of increased latency and errors for all users, especially if the number of workers configured to perform the sync is turned up.

At 2014-02-07 at about 00:00 UTC PyPI was effectively DoS’d for 45 minutes while a major research lab performing a sync via bandersnatch. It appears their worker count may have been configured as high as 50.

The design of PEP 381 mirroring clients requires calls to the PyPI XMLRPC to obtain changelogs and package serial numbers. As such, when clients are configured for high parallelism our backends can be quickly overwhelmed.

In order to maintain quality of service for all clients, we will begin rate limiting requests to the following routes:

  - /pypi
  - /mirrors
  - /id
  - /oauth
  - /security

The initial rates will be limited to 5 req/s per IP with bursts of 10 requests allowed. Client requests up to the burst limit will be delayed to maintain a 5 req/s maximum. Any requests past the 10 request burst will receive an HTTP 429 response code per RFC 6585.

Tuning these parameters will be painless, so if issues arise with mirroring clients we will be very responsive to necessary modifications.

Note that the routes used by installation clients (`/packages` and `/simple`) will remain unaffected as they are generally served from the CDN, and do not have as high of an overhead in our backend processes.

This rate-limiting is to be considered an interim solution, as I plan to begin a discussion on some updates to mirroring infrastructure guidelines.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20140208/f289b30d/attachment.sig>

More information about the Distutils-SIG mailing list