We're currently looking into adjusting the current Executor implementation (for both TPE and PPE) to not use daemon threads at all for subinterpreter compatibility, see
https://bugs.python.org/issue39812. Instead of using an atexit handler, there's some consideration for using a similar threading-specific "exit handler". Instead of being upon program exit, registered functions would be called just before all of the non-daemon threads are joined in `threading._shutdown()`.
As a result, I suspect that we are unlikely to implement something like the above. But, as a convenient side effect, it should somewhat address the current potentially surprising behavior.