While this is certainly an issue, I don't think it's the issue we're discussing here. Unreliability of coverage is largely mitigated by the fact that the main thing we pay attention to is "patch coverage", which can be seen to fluctuate from commit to commit on a branch if the new test coverage is non-deterministic (and rarely is a PR an individual commit). This is opposed to "coverage delta", which only looks at coverage before / coverage after and is indeed somewhat unpredictable due to old / bad tests.
So I can say when I've had to overrule codecov, it's almost never been because of flapping coverage lines outside of the patch under consideration (and the patches in consideration either have deterministic tests, or I ask the author to add them).
General improvements to build reliability often reduce coverage unreliability as well, so as we've been using Github more, which surfaces status visibility / mergeability to reviewers more, we've been fixing lots of little build-reliability issues and this problem continues to get smaller.
-glyph