Last years, I spent most of my time to focus on tests which fail reliably. Currently on buildbots, when a test fails, it is re-run alone in verbose mode. I focused on tests which failed twice. Almost all of these tests have been fixed.
Now comes another category. Tests which fail randomly. For example, only once every 10 builds, and only fail when run in parallel but then pass when run alone. Many failures are caused by race conditions, because tests weren't designed with race condition in mind, but designed to be run on a fast desktop computer which only runs tests sequentially while other applications are idle.
It might be interesting to know somehow which tests fail randomly, on which worker, at which frequency? Example of such failure: https://bugs.python.org/issue36402
Night gathers, and now my watch begins. It shall not end until my death.