Hi there, I'm currently finishing up vmprof native profilng for CPython & PyPy. Here are some highlights (soon to be released in 0.4.0): * Native profiling (C stack) is included in the profile (using libunwind) for Linux & Mac * Windows 64bit support (no native profiling) * The platform that reads the profile can be different from the platform that generated it * vmprof.com updates to the flamegraph and general style changes * Documentation updates I'm happy to receive any feedback. PyPy support is nearly finished, but there is one issue: On my branch (called vmprof-native), libunwind is required to be available on the system. Which means that we need to install those on the buildbots (which is fine) and libunwind suddenly becomes a dependency when building PyPy on Linux and Mac. Is the latter ok? Or should libunwind be an optional dependency? Cheers, Richard
Hi Richard, On 15 February 2017 at 11:22, Richard Plangger <planrichi@gmail.com> wrote:
On my branch (called vmprof-native), libunwind is required to be available on the system. Which means that we need to install those on the buildbots (which is fine) and libunwind suddenly becomes a dependency when building PyPy on Linux and Mac. Is the latter ok? Or should libunwind be an optional dependency?
Avoiding to make it a hard dependency would be a good idea. Also, libunwind is a hack that showed problems when vmprof previously supported C frames, and it was removed for that reason. Maybe you should give a word about why re-enabling C frames with libunwind looks ok now? A bientôt, Armin.
Hi,
Avoiding to make it a hard dependency would be a good idea. Also, libunwind is a hack that showed problems when vmprof previously supported C frames, and it was removed for that reason. Maybe you should give a word about why re-enabling C frames with libunwind looks ok now?
Can you elaborate on that? My understanding is that Maciej at some point complained that there are some issues and removed that feature (which Antonio was not particularly happy about). I never learned the issues. One complaint I remember (should be on IRC logs some place): It was along the lines: "... some times libunwind returns garbage ..." Speaking from my experience during the development: I have not seen a C stack trace that is totally implausible (even though I thought so, but I turned out to be some other error). It was very hard to get the trampoline right and it was also not easy to get the stack walking right (considering it should work for all platform combinations). There are several cases where libunwind could say: 'hey, unsure what to do, cannot rebuild the stack' and those cases now lead to skipping the sample in the signal. I'm happy to accept the fact that 'libunwind is garbage/hack/...' if somebody proves that. So if anyone has some insight on that, please speak up. There are alternatives like: use libbacktrace from gcc (which is already included in vmprof now). Cheers, Richard
There was definitely a massive problem with libunwind & JIT frames, which made it unsuitable for windows and os x Another issue was that libunwind made traces ten times bigger, for no immediate benefit other than "it might be useful some day" and added complexity. On linux I was getting ~7% of stack that was not correctly rebuilt. This is not an issue if you assume that the 7% is statistically distributed evenly, but I heavily doubt that is the case (and there is no way to check) which made us build a more robust approach. I think I would like the following properties: * everything works without libunwind, native=True raises an exception * with libunwind, we don't loose frames in python just because libunwind is unable to reconstruct the stack * we don't pay 10x storage just because there is an option to want native frames Can we have that? PS. How does the new approach works? If it always uses libunwind and ditches the original approach I'm very much -1 Cheers, fijal On Wed, Feb 15, 2017 at 12:10 PM, Richard Plangger <planrichi@gmail.com> wrote:
Hi,
Avoiding to make it a hard dependency would be a good idea. Also, libunwind is a hack that showed problems when vmprof previously supported C frames, and it was removed for that reason. Maybe you should give a word about why re-enabling C frames with libunwind looks ok now?
Can you elaborate on that? My understanding is that Maciej at some point complained that there are some issues and removed that feature (which Antonio was not particularly happy about). I never learned the issues. One complaint I remember (should be on IRC logs some place):
It was along the lines: "... some times libunwind returns garbage ..."
Speaking from my experience during the development:
I have not seen a C stack trace that is totally implausible (even though I thought so, but I turned out to be some other error). It was very hard to get the trampoline right and it was also not easy to get the stack walking right (considering it should work for all platform combinations).
There are several cases where libunwind could say: 'hey, unsure what to do, cannot rebuild the stack' and those cases now lead to skipping the sample in the signal.
I'm happy to accept the fact that 'libunwind is garbage/hack/...' if somebody proves that. So if anyone has some insight on that, please speak up. There are alternatives like: use libbacktrace from gcc (which is already included in vmprof now).
Cheers, Richard
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Hi, sorry about the lengthy email, but you asked for details :) windows is certainly out of scope (not even sure if somebody succeeded compiling libunwind on windows, probably it needs lots of porting). Some technical details how it now works now (did not finish it yet completely): In the PyPy world: * _U_dyn_register exposed by libunwind is used to save every piece of assembler generated by the JIT (_U_dyn_cancel is called if a loop token is finally collected), otherwise one cannot reliably do the matching between JIT trace <-> native libunwind symbol * The stack PyPy maintains for it's frames is still maintained as before Did you do that back then for PyPy as well? Stack walking: Depending on vmprof.enable(native=True/False) either: 1) native=True, the C stack is walked matching a special symbol __vmprof_eval_vmprof to the entries of 'kind' VMPROF_CODE_TAG. All other symbols exposed by pypy-c or libpypy-c.so are *ignored*. Most of them are internal functions during the interpreter loop. For PyPy this is not entirely true, because we include large parts of the standard library within libpypy-c.so, whereas cpython separates them in other shared objects. THis simply means you cannot log stack frames within those shared objects/executable. Using the facility in libunwind for dynamic code (_U_dyn_register), one can match the JIT frames with 'kind' VMPROF_JITTED_TAG. All other symbols are considered native and are logged as kind VMPROF_NATIVE_TAG. 2) native=False, witch means the stack is walked was it was before (iterating the list of pypy stack frames). Which means the current setup allows you to use either method what ever you prefer. Some notes about the properties:
* everything works without libunwind, native=True raises an exception * with libunwind, we don't loose frames in python just because libunwind is unable to reconstruct the stack
Yes I agree, though I would like to have that in the same pypy-c executable, meaning that native=True does not raise, but simply fulfills the second property.
* we don't pay 10x storage just because there is an option to want native frames
As described above, the filtering of libpypy-c.so/pypy-c internal symbols greatly reduces the size of the resulting traces. If you gdb from time to time 500 stack entries is nothing for cpython :), the filtering did a good job to make that much smaller (did some tests deep down in a pytest stack frame, 300 stack frames became 37 frames. All but ~5 of them where python level stack frames).
On linux I was getting ~7% of stack that was not correctly rebuilt. This is not an issue if you assume that the 7% is statistically distributed evenly, but I heavily doubt that is the case (and there is no way to check) which made us build a more robust approach.
We could check that, by logging 'trace was canceled' and saving a timestamp/counter for each signal. If a certain pattern occured you could tell the user: 'this profile has skipped lots of signals in a very short period, please run with native=False' Can you elaborate on the issue on PyPy + Mac OS X? I remember that you said that there are issues with the JIT code map. Which ones? Cheers, Richard
participants (3)
-
Armin Rigo
-
Maciej Fijalkowski
-
Richard Plangger