how to extend VTUNE support from pypy for application hot spot analysis

Hi, Armin created a minimum working version incorporating VTUNE to a branch at: https://bitbucket.org/pypy/pypy/branch/vtune This works in a sense such that statistically meaningful micro-architecture data could be collected. However, for hot spot analysis, we'd like to trace back to the Python application source code. We know the VTUNE hook point, but don't know how or where to start from the rpython source code. I ran gdb against pypy binary and found it really hard to get to the entry point/or knowledge we wanted. Could anyone give us some hints? Thanks, Peter

Hi Peter, On 29 July 2016 at 01:24, Wang, Peter Xihong <peter.xihong.wang@intel.com> wrote:
Armin created a minimum working version incorporating VTUNE to a branch at: https://bitbucket.org/pypy/pypy/branch/vtune
This works in a sense such that statistically meaningful micro-architecture data could be collected. However, for hot spot analysis, we’d like to trace back to the Python application source code.
Mapping raw addresses in JIT-generated assembly all the way back to Python is a bit difficult, but that's what vmprof tries to do. I can imagine a solution based on vmprof. For now, try to run pypy in the vtune branch together with vmprof: https://github.com/vmprof/vmprof-python . I think the vmprof log file contains some information that does the mapping we're looking for. For now we're not actively developing the vtune branch any more, but maybe we should. I should add that my copy of vtune-amplifier-xe started refusing to work, with "Error - no valid license - license expired" and no hint about if and how to renew that license, so that put me off. A bientôt, Armin.

Hi Armin, As far as I know (my team members tried this), vmprof does not allow us to attach to a running process? We will evaluate https://github.com/vmprof/vmprof-python if you think it's doable. Regarding to license, you could get free version as an active Open Source contributor: https://software.intel.com/en-us/qualify-for-free-software, click on "Open Source Contributor". Please let me know how it works for you. Also, would you need a Windows version of VTUNE as well? As far as feature is concerned, hot spot analysis on JITed code, mapping to source codes, along with getting other info, such as getting microarchitecture statistics, all on the same time, are also important to us. Best Regards, Peter -----Original Message----- From: armin.rigo@gmail.com [mailto:armin.rigo@gmail.com] On Behalf Of Armin Rigo Sent: Monday, August 01, 2016 2:18 AM To: Wang, Peter Xihong <peter.xihong.wang@intel.com> Cc: pypy-dev@python.org Subject: Re: [pypy-dev] how to extend VTUNE support from pypy for application hot spot analysis Hi Peter, On 29 July 2016 at 01:24, Wang, Peter Xihong <peter.xihong.wang@intel.com> wrote:
Armin created a minimum working version incorporating VTUNE to a branch at: https://bitbucket.org/pypy/pypy/branch/vtune
This works in a sense such that statistically meaningful micro-architecture data could be collected. However, for hot spot analysis, we’d like to trace back to the Python application source code.
Mapping raw addresses in JIT-generated assembly all the way back to Python is a bit difficult, but that's what vmprof tries to do. I can imagine a solution based on vmprof. For now, try to run pypy in the vtune branch together with vmprof: https://github.com/vmprof/vmprof-python . I think the vmprof log file contains some information that does the mapping we're looking for. For now we're not actively developing the vtune branch any more, but maybe we should. I should add that my copy of vtune-amplifier-xe started refusing to work, with "Error - no valid license - license expired" and no hint about if and how to renew that license, so that put me off. A bientôt, Armin.

On Tue, Aug 2, 2016 at 12:46 AM, Wang, Peter Xihong <peter.xihong.wang@intel.com> wrote:
Hi Armin,
As far as I know (my team members tried this), vmprof does not allow us to attach to a running process? We will evaluate https://github.com/vmprof/vmprof-python if you think it's doable.
You would need some form of process cooperation (I think) but it does not seem impossible. What I would do is I would run a separate thread that accepts something (e.g. a pipe write) and then starts vmprof. vmprof once started is global to all threads

Hi, On 2 August 2016 at 10:09, Maciej Fijalkowski <fijall@gmail.com> wrote:
As far as I know (my team members tried this), vmprof does not allow us to attach to a running process? We will evaluate https://github.com/vmprof/vmprof-python if you think it's doable.
You would need some form of process cooperation (I think) but it does not seem impossible. What I would do is I would run a separate thread that accepts something (e.g. a pipe write) and then starts vmprof. vmprof once started is global to all threads
Also, please note that I mentioned vmprof as a way to get started. You would need some *like* what vmprof does; for a clean solution you don't want to enable an additional profiler on your vtune code. A bientôt, Armin.

HI Armin and Maciej, Let us know once you have something we could actually try. One requirement is on the sampling/profiling overhead, ideally <1%, but >5% could be troublesome. We'd like to allow people to do performance analysis on production systems. Meanwhile, could I make this as two separate requests: 1. Application hot spot analysis. Today I could run cProfile with CPython and get application code profiles running OpenStack Swift, but can't do the same thing with PyPy 2. JITed code (assembly) mapping back to the application Python code. VTUNE integration with HHVM and node.js are completed and working today, and I'd hope to see same capability with PyPy. Thanks, Peter -----Original Message----- From: armin.rigo@gmail.com [mailto:armin.rigo@gmail.com] On Behalf Of Armin Rigo Sent: Tuesday, August 02, 2016 1:18 AM To: Maciej Fijalkowski <fijall@gmail.com> Cc: Wang, Peter Xihong <peter.xihong.wang@intel.com>; pypy-dev@python.org Subject: Re: [pypy-dev] how to extend VTUNE support from pypy for application hot spot analysis Hi, On 2 August 2016 at 10:09, Maciej Fijalkowski <fijall@gmail.com> wrote:
As far as I know (my team members tried this), vmprof does not allow us to attach to a running process? We will evaluate https://github.com/vmprof/vmprof-python if you think it's doable.
You would need some form of process cooperation (I think) but it does not seem impossible. What I would do is I would run a separate thread that accepts something (e.g. a pipe write) and then starts vmprof. vmprof once started is global to all threads
Also, please note that I mentioned vmprof as a way to get started. You would need some *like* what vmprof does; for a clean solution you don't want to enable an additional profiler on your vtune code. A bientôt, Armin.

Hi Peter The first request is fulfilled by vmprof The second one can be worked on using the same mechanisms as vmprof - there is C API that given the assembler address will give you the python stack. It's defined in rpython/jit/backend/llsupport/src/codemap.c I believe On Thu, Aug 4, 2016 at 3:05 AM, Wang, Peter Xihong <peter.xihong.wang@intel.com> wrote:
HI Armin and Maciej,
Let us know once you have something we could actually try. One requirement is on the sampling/profiling overhead, ideally <1%, but >5% could be troublesome. We'd like to allow people to do performance analysis on production systems.
Meanwhile, could I make this as two separate requests: 1. Application hot spot analysis. Today I could run cProfile with CPython and get application code profiles running OpenStack Swift, but can't do the same thing with PyPy 2. JITed code (assembly) mapping back to the application Python code. VTUNE integration with HHVM and node.js are completed and working today, and I'd hope to see same capability with PyPy.
Thanks,
Peter
-----Original Message----- From: armin.rigo@gmail.com [mailto:armin.rigo@gmail.com] On Behalf Of Armin Rigo Sent: Tuesday, August 02, 2016 1:18 AM To: Maciej Fijalkowski <fijall@gmail.com> Cc: Wang, Peter Xihong <peter.xihong.wang@intel.com>; pypy-dev@python.org Subject: Re: [pypy-dev] how to extend VTUNE support from pypy for application hot spot analysis
Hi,
On 2 August 2016 at 10:09, Maciej Fijalkowski <fijall@gmail.com> wrote:
As far as I know (my team members tried this), vmprof does not allow us to attach to a running process? We will evaluate https://github.com/vmprof/vmprof-python if you think it's doable.
You would need some form of process cooperation (I think) but it does not seem impossible. What I would do is I would run a separate thread that accepts something (e.g. a pipe write) and then starts vmprof. vmprof once started is global to all threads
Also, please note that I mentioned vmprof as a way to get started. You would need some *like* what vmprof does; for a clean solution you don't want to enable an additional profiler on your vtune code.
A bientôt,
Armin.

Hi Peter, On 4 August 2016 at 08:05, Maciej Fijalkowski <fijall@gmail.com> wrote:
The second one can be worked on using the same mechanisms as vmprof - there is C API that given the assembler address will give you the python stack. It's defined in rpython/jit/backend/llsupport/src/codemap.c I believe
I should add that vtune's particular license towards Open Source projects, trying to prevent any paid work from occurring on an open source project, did change my position about it: I now consider vtune as unfriendly to open source, as its license subtly prevents growth of the projects using it. I'm not usually the kind of guy that would go on a rant about license A or B, but this one strikes me as unworkable. This license is made by considering that commercial interests and open source never interact. I will thus reply to it by saying that I may consider helping you anyway---for a fee, with a regular commercial contract. A bientôt, Armin.

On Thu, Aug 4, 2016 at 8:37 AM, Armin Rigo <arigo@tunes.org> wrote:
Hi Peter,
On 4 August 2016 at 08:05, Maciej Fijalkowski <fijall@gmail.com> wrote:
The second one can be worked on using the same mechanisms as vmprof - there is C API that given the assembler address will give you the python stack. It's defined in rpython/jit/backend/llsupport/src/codemap.c I believe
I should add that vtune's particular license towards Open Source projects, trying to prevent any paid work from occurring on an open source project, did change my position about it: I now consider vtune as unfriendly to open source, as its license subtly prevents growth of the projects using it. I'm not usually the kind of guy that would go on a rant about license A or B, but this one strikes me as unworkable. This license is made by considering that commercial interests and open source never interact. I will thus reply to it by saying that I may consider helping you anyway---for a fee, with a regular commercial contract.
A bientôt,
Armin.
I fully support armins position here. The notion that Open Source developers can somehow feed of pure air *or* not use the advantage of spending endless free hours to make some money strikes me as a very strange concept.

Hi Armin and Maciej, We are looking into "rpython/jit/backend/llsupport/src/codemap.c" for the mapping. Really appreciate this tip. Regarding to VTUNE license/open source, fully understood the concern, and I did escalate the issue. Thanks, Peter -----Original Message----- From: Maciej Fijalkowski [mailto:fijall@gmail.com] Sent: Wednesday, August 03, 2016 11:06 PM To: Wang, Peter Xihong <peter.xihong.wang@intel.com> Cc: Armin Rigo <arigo@tunes.org>; pypy-dev@python.org Subject: Re: [pypy-dev] how to extend VTUNE support from pypy for application hot spot analysis Hi Peter The first request is fulfilled by vmprof The second one can be worked on using the same mechanisms as vmprof - there is C API that given the assembler address will give you the python stack. It's defined in rpython/jit/backend/llsupport/src/codemap.c I believe On Thu, Aug 4, 2016 at 3:05 AM, Wang, Peter Xihong <peter.xihong.wang@intel.com> wrote:
HI Armin and Maciej,
Let us know once you have something we could actually try. One requirement is on the sampling/profiling overhead, ideally <1%, but >5% could be troublesome. We'd like to allow people to do performance analysis on production systems.
Meanwhile, could I make this as two separate requests: 1. Application hot spot analysis. Today I could run cProfile with CPython and get application code profiles running OpenStack Swift, but can't do the same thing with PyPy 2. JITed code (assembly) mapping back to the application Python code. VTUNE integration with HHVM and node.js are completed and working today, and I'd hope to see same capability with PyPy.
Thanks,
Peter
-----Original Message----- From: armin.rigo@gmail.com [mailto:armin.rigo@gmail.com] On Behalf Of Armin Rigo Sent: Tuesday, August 02, 2016 1:18 AM To: Maciej Fijalkowski <fijall@gmail.com> Cc: Wang, Peter Xihong <peter.xihong.wang@intel.com>; pypy-dev@python.org Subject: Re: [pypy-dev] how to extend VTUNE support from pypy for application hot spot analysis
Hi,
On 2 August 2016 at 10:09, Maciej Fijalkowski <fijall@gmail.com> wrote:
As far as I know (my team members tried this), vmprof does not allow us to attach to a running process? We will evaluate https://github.com/vmprof/vmprof-python if you think it's doable.
You would need some form of process cooperation (I think) but it does not seem impossible. What I would do is I would run a separate thread that accepts something (e.g. a pipe write) and then starts vmprof. vmprof once started is global to all threads
Also, please note that I mentioned vmprof as a way to get started. You would need some *like* what vmprof does; for a clean solution you don't want to enable an additional profiler on your vtune code.
A bientôt,
Armin.

Hi Peter, On 2 August 2016 at 00:46, Wang, Peter Xihong <peter.xihong.wang@intel.com> wrote:
Regarding to license, you could get free version as an active Open Source contributor: https://software.intel.com/en-us/qualify-for-free-software, click on "Open Source Contributor".
"""I will not be compensated in any form for applications developed or maintained with Intel® Software Development Products under the terms of the non-commercial license""": this is the box I need to check. The Intel license thus seems to not apply in pypy's case, even though it is and will remain open source. The exact text seems also to imply that I will never in the future be allowed to get money for my work; I would thus never recommend anyone checks this box, as it theoretically preclude them from ever getting paid to continue working on their open source project... A bientôt, Armin.

Hi Armin, I am working to get you a special license. You will be contacted directly. Thanks, Peter -----Original Message----- From: armin.rigo@gmail.com [mailto:armin.rigo@gmail.com] On Behalf Of Armin Rigo Sent: Tuesday, August 02, 2016 5:53 AM To: Wang, Peter Xihong <peter.xihong.wang@intel.com> Cc: pypy-dev@python.org Subject: Re: [pypy-dev] how to extend VTUNE support from pypy for application hot spot analysis Hi Peter, On 2 August 2016 at 00:46, Wang, Peter Xihong <peter.xihong.wang@intel.com> wrote:
Regarding to license, you could get free version as an active Open Source contributor: https://software.intel.com/en-us/qualify-for-free-software, click on "Open Source Contributor".
"""I will not be compensated in any form for applications developed or maintained with Intel® Software Development Products under the terms of the non-commercial license""": this is the box I need to check. The Intel license thus seems to not apply in pypy's case, even though it is and will remain open source. The exact text seems also to imply that I will never in the future be allowed to get money for my work; I would thus never recommend anyone checks this box, as it theoretically preclude them from ever getting paid to continue working on their open source project... A bientôt, Armin.
participants (3)
-
Armin Rigo
-
Maciej Fijalkowski
-
Wang, Peter Xihong