[New-bugs-announce] [issue43888] GitHub Actions CI/CD `Coverage` job is broken on master

Sviatoslav Sydorenko report at bugs.python.org
Mon Apr 19 07:50:46 EDT 2021

New submission from Sviatoslav Sydorenko <svyatoslav at sydorenko.org.ua>:

I noticed that https://github.com/python/cpython/runs/2378199636 (a coverage job on the last commit on master at the time of writing) takes suspiciously long to complete.

I did some investigation and noticed that this job on the 3.9 branch succeeds (all of the job runs on the first page in the list are green — https://github.com/python/cpython/actions/workflows/coverage.yml?query=branch%3A3.9)

But then I took a look at the runs on master and discovered that the last successful run was 4 months ago — https://github.com/python/cpython/actions.html?query=is%3Asuccess+branch%3Amaster&workflow_file_name=coverage.yml.

The last success is https://github.com/python/cpython/actions/runs/444323166 and after that, starting with https://github.com/python/cpython/actions/runs/444405699, if fails consistently.

Notably, all of the failures are caused by the job timeout after *6 hours* — GitHub platform just kills those, 6h is a default per-job timeout in GHA.

It's also important to mention that before every job starting timing out effectively burning 6 hours of GHA time for each merge and producing no useful reports, there were occasional 6h-timeouts but they weren't consistent.

Looking into the successful runs from the past, on master and other jobs, I haven't noticed it taking more than 1h35m to complete with a successful outcome. Taking into account this as a baseline, I suggest changing the timeout of the whole job or maybe just one step that actually runs coverage.

Action items:
* Set job timeout in GHA to 1h40m (allowing a bit of extra time for exceptionally slow jobs) — this will make sure that the failure/timeout is reported sooner than 6h
* Figure out why this started happening in the first place.

I'm going to send a PR addressing the first point but feel free to pick up the investigation part — I don't expect to have time for this anytime soon.

P.S. FTR the last timeout of this type happened two months ago — https://github.com/python/cpython/actions.html?page=4&query=branch%3A3.9&workflow_file_name=coverage.yml.

messages: 391373
nosy: webknjaz
priority: normal
severity: normal
status: open
title: GitHub Actions CI/CD `Coverage` job is broken on master
type: crash

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list