Is clock-throttling of interest here?
It would be really annoying if the code that chooses a macro implementation has to guess how much power will be consumed by each core. Or has to dynamically pick a macro implementation based on the current frequencies of all the cores.
https://lemire.me/blog/2018/08/13/the-dangers-of-avx-512-throttling-myth-or-...
https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-inst...
Dan