How can I integrate pyperf[ormance] w/ a heavy-duty memory profiler?

Hi there,
I apologize in advance for any potential dumbness in my question.
bytehound <https://github.com/koute/bytehound> is a cool profiler that works like this:
|$ export MEMORY_PROFILER_LOG=warn $ LD_PRELOAD=./libbytehound.so ./your_application |
||
What I'm after is to plug it in every run of the benchmarks in pyperformance <https://github.com/python/pyperformance>. It seems like the steps to achieve this are:
- Fork pyperf <https://github.com/psf/pyperf>, since it is the actual benchmark-running engine.
- Modify pyperf so as to make command prepending possible somehow.
- Install this modified pyperf.
- Install pyperformance on top of the modified pyperf.
I do not have the knowledge or experience to tackle step 2 on my own. Could I have your opinion on the whole endeavor, and maybe some advice?
Thanks a lot, Christos Lamprakos
||
||||

Hi,
Did you try to write a pyperf benchmark and run it with:
$ MEMORY_PROFILER_LOG=warn LD_PRELOAD=./libbytehound.so python3 mybench.py -o -v --inherit-environ=MEMORY_PROFILER_LOG,LD_PRELOAD
By defauly, pyperf removes most environment variables, you should have explicitly specify which ones are inherited.
pyperf spawns ~20 processes.
Does it work?
Victor
On Mon, Feb 7, 2022 at 9:00 PM Christos P. Lamprakos <cplamprakos@microlab.ntua.gr> wrote:
Hi there,
I apologize in advance for any potential dumbness in my question.
bytehound is a cool profiler that works like this:
$ export MEMORY_PROFILER_LOG=warn $ LD_PRELOAD=./libbytehound.so ./your_application
What I'm after is to plug it in every run of the benchmarks in pyperformance. It seems like the steps to achieve this are:
Fork pyperf, since it is the actual benchmark-running engine. Modify pyperf so as to make command prepending possible somehow. Install this modified pyperf. Install pyperformance on top of the modified pyperf.
I do not have the knowledge or experience to tackle step 2 on my own. Could I have your opinion on the whole endeavor, and maybe some advice?
Thanks a lot, Christos Lamprakos
Speed mailing list -- speed@python.org To unsubscribe send an email to speed-leave@python.org https://mail.python.org/mailman3/lists/speed.python.org/ Member address: vstinner@python.org
-- Night gathers, and now my watch begins. It shall not end until my death.

Thank you for your response as well as for all your work on Python, Mr. Stinner!
I am currently facing some installation issues with bytehound, so it will be a while before I get back to you. Please allow me to ask this in the meantime:
As you point out, pyperf spawns lots of processes. I am interested only in those that actually run each benchmark. For instance, in the case of 2to3, I want to profile just the command that executes 2to3. If the inherit-environ option attaches the bytehound-related variables to the rest 19 processes besides the one running 2to3, tons of unwanted data will be collected. My experiments regard a resource-constrained device, so I would like to avoid that. Is this possible?
Secondly, why do you propose writing my own benchmark instead of using inherit-environ directly with pyperformance?
Last but not least, if installing bytehound proves too hard, I will need to fall back on heaptrack <https://github.com/KDE/heaptrack> or something similar. In this case profiler invocation is done with prepending 'heaptrack' to the command of interest instead of using some environment variable. Hence the question on command prepending in my previous email.
I understand that what I'm asking is not directly related to pyperformance's goals. But your package is, to the best of my knowledge, the only production-level entry point for studying various aspects of CPython on real applications. So does this benchmark-running command prepending feature sound (i) worthwhile and (ii) feasible to you? If yes, I would be glad to help!
Thanks again, Christos
P.S. All this fuss is made in the context of trying to unearth similarities in application behavior that could drive dynamic memory allocation optimization. Example: if application A performs better with jemalloc and application B is similar to application A, then try linking B with jemalloc too and see what happens.
P.S.2. If the above sounds interesting, I could definitely use some guidance to make it happen. If it sounds stupid, I would be indebted to you for saving me from all the effort!
On 8/2/22 00:33, Victor Stinner wrote:
Hi,
Did you try to write a pyperf benchmark and run it with:
$ MEMORY_PROFILER_LOG=warn LD_PRELOAD=./libbytehound.so python3 mybench.py -o -v --inherit-environ=MEMORY_PROFILER_LOG,LD_PRELOAD
By defauly, pyperf removes most environment variables, you should have explicitly specify which ones are inherited.
pyperf spawns ~20 processes.
Does it work?
Victor
On Mon, Feb 7, 2022 at 9:00 PM Christos P. Lamprakos <cplamprakos@microlab.ntua.gr> wrote:
Hi there,
I apologize in advance for any potential dumbness in my question.
bytehound is a cool profiler that works like this:
$ export MEMORY_PROFILER_LOG=warn $ LD_PRELOAD=./libbytehound.so ./your_application
What I'm after is to plug it in every run of the benchmarks in pyperformance. It seems like the steps to achieve this are:
Fork pyperf, since it is the actual benchmark-running engine. Modify pyperf so as to make command prepending possible somehow. Install this modified pyperf. Install pyperformance on top of the modified pyperf.
I do not have the knowledge or experience to tackle step 2 on my own. Could I have your opinion on the whole endeavor, and maybe some advice?
Thanks a lot, Christos Lamprakos
Speed mailing list --speed@python.org To unsubscribe send an email tospeed-leave@python.org https://mail.python.org/mailman3/lists/speed.python.org/ Member address:vstinner@python.org

On Tue, Feb 8, 2022 at 11:28 AM Christos P. Lamprakos <cplamprakos@microlab.ntua.gr> wrote:
As you point out, pyperf spawns lots of processes. I am interested only in those that actually run each benchmark. For instance, in the case of 2to3, I want to profile just the command that executes 2to3. If the inherit-environ option attaches the bytehound-related variables to the rest 19 processes besides the one running 2to3, tons of unwanted data will be collected. My experiments regard a resource-constrained device, so I would like to avoid that. Is this possible?
In pyperformance, you can use -b option to only run a single benchmark:
pyperformance run -b 2to3
pyperformance benchmarks can be found in pyperformance/benchmarks/. Each script is a standalone benchmark written with pyperf which can produce a JSON output using -o option: try --help.
Last but not least, if installing bytehound proves too hard, I will need to fall back on heaptrack or something similar. In this case profiler invocation is done with prepending 'heaptrack' to the command of interest instead of using some environment variable. Hence the question on command prepending in my previous email.
pyperf includes 2 options to track the memory usage:
--track-memory --tracemalloc
It tracks the memory in each worker process and then measure the mean memory usage with the std dev. Regular tools like "pyperf stats" work on it.
See pyperf doc for limitations: https://pyperf.readthedocs.io/en/latest/runner.html
Victor
Night gathers, and now my watch begins. It shall not end until my death.
participants (2)
-
Christos P. Lamprakos
-
Victor Stinner