slow-ish multithreaded primitives

Hey all, I wrote some threaded code and ran it under different version of python, and unexpectedly high overhead is observed: cpython 2.6.8. cpython 2.7.6, pypy 2.2.1: 14~53ms cpython 3.3.4: pypy 2.1.0 beta py3 mode: <1ms is someone here interested in getting to the bottom of it? or getting pypy py2 mode in line with py3 mode? the set up: 1 lock 2 threads each thread has own condition variable both condition variables share the lock mentioned above. threads wake each other up in turn. arch x86_64, linux 3.12.9, up to date libc/rt/pthread sorry I don't have code handy, it's part of a larger project, but if someone's interested, please reply and I'll hack up a short test case. thanks, d.

Hi Dima, On 22 February 2014 20:51, Dima Tisnek <dimaqq@gmail.com> wrote:
Right, I narrowed it down to condition.wait being much slower with a timeout than without.
Thanks! Fixed. Indeed, I simply took the version of lock.acquire() from the py3k branch (with support for timeout and interrupts), and applied it in the default branch, under the name lock._py3k_acquire(). Then, simply fixing threading.py to use this, solves the performance issue reported here. I guess the same could be done with CPython --- it's just a performance fix --- but given the destructive approach of python-dev towards 2.7, I doubt it will be accepted. A bientôt, Armin.

Hi, On 23 February 2014 19:54, Maciej Fijalkowski <fijall@gmail.com> wrote:
Yes, lock._py3k_acquire() has been added to the default PyPy and will be in the next release. It's not completely clear yet if the naming of this particular method is best. It could also be called lock._pypy_acquire(), or be instead a built-in function in the __pypy__ module. The point is to not change the semantics of 2.7's lock.acquire(), as it would indeed require 2.8 --- and there is little point for just a couple of minor details like this one. A bientôt, Armin.

Armin, is there really a semantical change? Consider invocations valid in 2.7, (i.e. without timeout argument), is it not the same then? I'd rather see improvement to existing python programs :) should this code be in nightly builds? my original use case was much more convoluted than the minimal test script, I'd like to see if original issue is also solved. d. On 23 February 2014 21:06, Armin Rigo <arigo@tunes.org> wrote:

Can I try to make a case for _py3k_acquire inclusion when using context manager API? Let's say a well-formed Python program always context managers, and thus timeouts are only supplied to condition,wait(): c = threading.Condition() with c: while something: c.wait(some time) change state with c: c.notifyAll() What is the semantic difference in the choice of the underlying implementation of c._Condition__lock._RLock__block.acquire vs _py3k_acquire? what could go wrong if c._Condition_lock.__enter__ was mapped to _py3k_acquire instead? AFAIK context manager API doesn't allow user to pass blocking=0 here. Thus lock acquisition cannot time out. Seems pretty solid to me... That still leaves signal handling. Is the concern here about the context in which signal handler executes? the behaviour of user program because signal may be caught earlier? unexpected exception site for KeyboardInterrupt? d. On 27 February 2014 15:54, Armin Rigo <arigo@tunes.org> wrote:

Oh, so sorry to have jumped the gun. now that I properly tested the nightly build I see that the performance issue I saw is gone and that condition.acquire actually calls _py3k_acquire when timeout argument is present. d. On 10 March 2014 09:38, Dima Tisnek <dimaqq@gmail.com> wrote:

Hi Dima, On 22 February 2014 20:51, Dima Tisnek <dimaqq@gmail.com> wrote:
Right, I narrowed it down to condition.wait being much slower with a timeout than without.
Thanks! Fixed. Indeed, I simply took the version of lock.acquire() from the py3k branch (with support for timeout and interrupts), and applied it in the default branch, under the name lock._py3k_acquire(). Then, simply fixing threading.py to use this, solves the performance issue reported here. I guess the same could be done with CPython --- it's just a performance fix --- but given the destructive approach of python-dev towards 2.7, I doubt it will be accepted. A bientôt, Armin.

Hi, On 23 February 2014 19:54, Maciej Fijalkowski <fijall@gmail.com> wrote:
Yes, lock._py3k_acquire() has been added to the default PyPy and will be in the next release. It's not completely clear yet if the naming of this particular method is best. It could also be called lock._pypy_acquire(), or be instead a built-in function in the __pypy__ module. The point is to not change the semantics of 2.7's lock.acquire(), as it would indeed require 2.8 --- and there is little point for just a couple of minor details like this one. A bientôt, Armin.

Armin, is there really a semantical change? Consider invocations valid in 2.7, (i.e. without timeout argument), is it not the same then? I'd rather see improvement to existing python programs :) should this code be in nightly builds? my original use case was much more convoluted than the minimal test script, I'd like to see if original issue is also solved. d. On 23 February 2014 21:06, Armin Rigo <arigo@tunes.org> wrote:

Can I try to make a case for _py3k_acquire inclusion when using context manager API? Let's say a well-formed Python program always context managers, and thus timeouts are only supplied to condition,wait(): c = threading.Condition() with c: while something: c.wait(some time) change state with c: c.notifyAll() What is the semantic difference in the choice of the underlying implementation of c._Condition__lock._RLock__block.acquire vs _py3k_acquire? what could go wrong if c._Condition_lock.__enter__ was mapped to _py3k_acquire instead? AFAIK context manager API doesn't allow user to pass blocking=0 here. Thus lock acquisition cannot time out. Seems pretty solid to me... That still leaves signal handling. Is the concern here about the context in which signal handler executes? the behaviour of user program because signal may be caught earlier? unexpected exception site for KeyboardInterrupt? d. On 27 February 2014 15:54, Armin Rigo <arigo@tunes.org> wrote:

Oh, so sorry to have jumped the gun. now that I properly tested the nightly build I see that the performance issue I saw is gone and that condition.acquire actually calls _py3k_acquire when timeout argument is present. d. On 10 March 2014 09:38, Dima Tisnek <dimaqq@gmail.com> wrote:
participants (4)
-
Armin Rigo
-
Dima Tisnek
-
Maciej Fijalkowski
-
Mark Roberts