
Hi, the original STM proposal spoke of HTM as of a thing of a far future. Now, Haswells are out and provide built-in HTM support in form of TSX. In the near future I expect more and more systems to have it. Are there plan to make PyPy use HTM if it is available on the system? Regards, Dimitri.

Hi Dimitri, On Wed, Nov 27, 2013 at 9:17 AM, Dimitri Vorona <alendit@googlemail.com> wrote:
the original STM proposal spoke of HTM as of a thing of a far future. Now, Haswells are out and provide built-in HTM support in form of TSX. In the near future I expect more and more systems to have it.
Are there plan to make PyPy use HTM if it is available on the system?
I don't know yet. I've just started playing with an Intel Haswell, and getting slightly bad results in the form of too many random transaction aborts. This seems so for both "small" transactions that only access some 20KB of data, up to larger transaction of almost 768KB, which is impressively three times the size of the L2 cache; this seems to say that even the L3 cache can dedicate a part of its resources to storing the transaction cache lines. But a naive extrapolation of the single-threaded results shows that, if we had instead 8 threads running with the same results, even on completely independent data, they would still abort too many transactions each. Whenever a transaction needs to be redone without HTM, it really needs to stop all other threads. So "too many" is in this sense: even if it is only 10-20% on each core, it's enough to prevent any scaling beyond just a coupe of cores. It may be that I'm missing something, like a way to learn where conflicts occur. But all in all it is unclear if this is good enough for PyPy (or CPython). The next step, which I might do anyway, would be to extract from the pypy-stm branch the general logic (most notably the numerous conflict-avoiding small changes), and try to run that with HTM. This probably requires writing a different GC, but it should be easy at this point to do, experimentally. A bientôt, Armin.

My gut says that HTM would be something used to help make STM commits faster (if that's possible), not to replace PyPy's STM machinery entirely. And maybe with something like CPython, one could replace the GIL entirely with HTM, but you'd probably want to make the "ticks between releases" a lot shorter to reduce the chance of conflicts. Things to try in my Copious Amounts of Free Time. :/ On Tue, Dec 3, 2013 at 1:47 PM, Armin Rigo <arigo@tunes.org> wrote:
Hi Dimitri,
On Wed, Nov 27, 2013 at 9:17 AM, Dimitri Vorona <alendit@googlemail.com> wrote:
the original STM proposal spoke of HTM as of a thing of a far future. Now, Haswells are out and provide built-in HTM support in form of TSX. In the near future I expect more and more systems to have it.
Are there plan to make PyPy use HTM if it is available on the system?
I don't know yet. I've just started playing with an Intel Haswell, and getting slightly bad results in the form of too many random transaction aborts.
This seems so for both "small" transactions that only access some 20KB of data, up to larger transaction of almost 768KB, which is impressively three times the size of the L2 cache; this seems to say that even the L3 cache can dedicate a part of its resources to storing the transaction cache lines.
But a naive extrapolation of the single-threaded results shows that, if we had instead 8 threads running with the same results, even on completely independent data, they would still abort too many transactions each. Whenever a transaction needs to be redone without HTM, it really needs to stop all other threads. So "too many" is in this sense: even if it is only 10-20% on each core, it's enough to prevent any scaling beyond just a coupe of cores.
It may be that I'm missing something, like a way to learn where conflicts occur. But all in all it is unclear if this is good enough for PyPy (or CPython). The next step, which I might do anyway, would be to extract from the pypy-stm branch the general logic (most notably the numerous conflict-avoiding small changes), and try to run that with HTM. This probably requires writing a different GC, but it should be easy at this point to do, experimentally.
A bientôt,
Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
-- taa /*eof*/
participants (3)
-
Armin Rigo
-
Dimitri Vorona
-
Taavi Burns