ACTION NEEDED: upgrade your buildbot client configuration
Hi Buildbot owners,
You need to manually update your buildbot owner clients configuration:
- Change the buildbot server host to: buildbot-api.python.org (it was buildbot.python.org previously)
- Change the keepalive to: 60 (seconds, it was usually 600 previously)
You should also clean up manually the temporary directory (/tmp) to remove old "ccXXX" and "tmpXXX" files:
rm -f /tmp/{cc,tmp}*
Stop the client, remove these files, and then start again the client.
For example, on some Red Hat workers, /tmp contained up to 20 GB of these "temporary" files! These files are leaked by buildbot builds interrupted by closed TCP connections.
Note: we might pass the "-pipe" option to GCC to avoid leaking temporary "ccXXX" files when GCC is killed by an interrupted job. But I'm not sure how to do that.
--
So, what happened?
https://bugs.python.org/issue41642
One week ago, Ernest and Pablo migrated the buildbot server to a new machine with a larger disk. The old machine only had 8 GB disk and its disk was frequently full because of the large (JUnit) XML files. The new machine is now behind a load balancer which is great to have clean x.509 certificates on HTTPS (https://buildbot.python.org/).
The TCP port 9020 used by buildbot clients also goes through the load balancer for consistency with the whole PSF infrastructure. The problem is that the load balancer seems to like closing idle TCP connections open for longer than 60 seconds. Ernest extended the timeout to 24 hours, but it seems like the load balancers still closes TCP connections of buildbot clients (workers).
Ernest exposed the TCP port 9020 directly to the Internet under a new host name: buildbot-api.python.org. This host is not behind the load balancer, and so is not affected by the closed TCP connections issue.
Please update your buildbot client configuration to use the new "buildbot-api.python.org" host. I also suggest to use a keepalive of 60 seconds, it should prevent further TCP connection issue in the future (if one day, the host goes behind a load balancer again).
Victor
Night gathers, and now my watch begins. It shall not end until my death.
participants (1)
-
Victor Stinner