performance issue with context managers
Hi list, I was toying around a bit with writing a statistical profiler in python, and came up with https://gist.github.com/jap/5584946 For reference, the main routine is: with Profiler() as p: with ProfilerContext("c1"): s = "" for t in range(100000): with ProfilerContext("c2"): s = s + "a" s = s + "b" print p.get_data() When running it on my local machine, this gives the following output: spaans@spaans-e6500:/tmp$ /usr/bin/time ~/xsrc/pypy-2.0/bin/pypy pmp.py Counter({'c1': 638}) 6.42user 0.42system 0:07.06elapsed 97%CPU (0avgtext+0avgdata 30160maxresident)k 0inputs+0outputs (0major+8383minor)pagefaults 0swaps spaans@spaans-e6500:/tmp$ /usr/bin/time python pmp.py Counter({'c1': 18, 'c2': 3}) 0.23user 0.02system 0:00.25elapsed 98%CPU (0avgtext+0avgdata 8200maxresident)k 0inputs+0outputs (0major+2226minor)pagefaults 0swaps So, two things seem to happen: the pypy version is almost 30 times slower than the python version (but hey, string appending has poor performance), and it somehow does not trigger the "c2" context.. Is the c2 context supposed to disappear? If I change the main loop to for t in range(100000): with ProfilerContext("c2"): s = s + "a" The output is still limited to spaans@spaans-e6500:/tmp$ /usr/bin/time ~/xsrc/pypy-2.0/bin/pypy pmp.py Counter({'c1': 156}) 1.61user 0.14system 0:01.78elapsed 98%CPU (0avgtext+0avgdata 30040maxresident)k which seems credible, because when I change it to for t in range(100000): s = s + "a" it's suddenly fast in pypy as well: 0.03user 0.01system 0:00.05elapsed 96%CPU (0avgtext+0avgdata 8024maxresident)k Removing all the threading.local stuff gives me the following performance data: spaans@spaans-e6500:/tmp$ /usr/bin/time python pmp.py Counter({'c1': 12, 'c2': 6}) 0.20user 0.01system 0:00.22elapsed 98%CPU (0avgtext+0avgdata 8192maxresident)k 0inputs+0outputs (0major+2224minor)pagefaults 0swaps spaans@spaans-e6500:/tmp$ /usr/bin/time ~/xsrc/pypy-2.0/bin/pypy pmp.py Counter({'c1': 621}) 6.18user 0.42system 0:06.76elapsed 97%CPU (0avgtext+0avgdata 30084maxresident)k which does not seem to differ that much. Finally, to rule out object creation issues, main was changed to: with Profiler() as p: with ProfilerContext("c1"): p2 = ProfilerContext("c2") s = "" for t in range(100000): with p2: s = s + "a" s = s + "b" print p.get_data() but that still behaves similar to the previous runs: spaans@spaans-e6500:/tmp$ /usr/bin/time ~/xsrc/pypy-2.0/bin/pypy pmp.py Counter({'c1': 624}) 6.25user 0.38system 0:06.69elapsed 99%CPU (0avgtext+0avgdata 29792maxresident)k spaans@spaans-e6500:/tmp$ /usr/bin/time python pmp.py Counter({'c2': 6, 'c1': 6}) 0.14user 0.02system 0:00.16elapsed 98%CPU (0avgtext+0avgdata 8188maxresident)k (all of this on my trusty old Dell E6500 running Ubuntu 13.04 on amd64) Could someone help me get "c2" registered, or is this expected behaviour? :) Cheers, Jasper -- /\____/\ ir. Jasper Spaans // Lead Developer DetACT \ (_)/ Fox-IT - For a more secure society! \ X T: +31-15-2847999 \ / \ M: +31-6-41588725 \/ KvK Haaglanden 27301624
Hi Jasper, On Thu, May 16, 2013 at 9:14 AM, Jasper Spaans <spaans@fox-it.com> wrote:
I was toying around a bit with writing a statistical profiler in python, and came up with https://gist.github.com/jap/5584946
It's a statistical profiler based on signals: whenever a signal is delivered, it checks where it is and counts. What occurs is that the signal delivery points are a bit more restricted when running JITted code. The inner loop of your example:
for t in range(100000): with ProfilerContext("c2"): s = s + "a"
is quickly compiled to machine code that does this: guard that t < 1000000 append "c2" to the list local_context_stack.data s = s + "a" remove the last item from local_context_stack.data guard that there was no signal jump back to the top of the loop So it only checks for signals once per loop at the end, instead of (as usual when interpreting) at random points during the loop. Signals will never be delivered when "c2" is in the local_context_stack... A bientôt, Armin.
On Thu, May 16, 2013 at 9:14 AM, Jasper Spaans <spaans@fox-it.com> wrote:
Hi list,
I was toying around a bit with writing a statistical profiler in python, and came up with https://gist.github.com/jap/5584946
For reference, the main routine is:
with Profiler() as p: with ProfilerContext("c1"): s = "" for t in range(100000): with ProfilerContext("c2"): s = s + "a" s = s + "b" print p.get_data()
Also, it's not that string concatenation has poor performance, it has quadratic performance. It's the same as in cpython if you made two references to s, your performance will plummet. (each s + 'a' would do a copy). We simply don't have the refcount hack Please use l.append('a') and later ''.join()
participants (3)
-
Armin Rigo
-
Jasper Spaans
-
Maciej Fijalkowski