Small improvements to the profile/cProfile API
Hi folks, Every time I do some Python profiling (with cProfile) the API feels kinda baroque, and I have to write ~10 line helper functions to do what I want. For example, if I want the return value of running some function and the profiling output as a string (e.g., to send as part of a web response), I have to do something like this: import cProfile import io import pstats def profile(func, *args, **kwargs): profiler = cProfile.Profile() result = profiler.runcall(func, *args, **kwargs) stream = io.StringIO() stats = pstats.Stats(profiler, stream=stream) stats.sort_stats('cumulative') stats.print_stats(strip_dirs=True) return (result, stream.getvalue()) Something like that might be a useful function in its own right, but I took a look at the current API, and also an open issue that addresses some of this (https://bugs.python.org/issue9285 last touched almost 4 years ago), and came up with the following: 1) Add a format_stats() function to Profile to give the profiling results as a string (kind of parallel to format_* vs print_* functions in the "traceback" module). Signature would format_stats(self, *restrictions, sort='stdname', strip_dirs=False). 2) Add __enter__ and __exit__ to Profile so you can use it in a "with" statement. 3) Add a top-level runcall() function to the cProfile (and profile) modules. This isn't particularly useful for me, but it'd make the module easier to use from the command line, and it's one of the API improvements over at issue 9285. Other things in issue 9285 that I don't think are a good idea: the separate runblock() context manager (there should be only one way to do it, and I think "with Profile()" is great), and the @profile decorator (I really don't see the use of this -- you don't always want to profile a function when calling it, only it certain contexts). My code implementing the above three things as a separate module (subclassing Profile) is copied below. I think the above additions are non-controversial improvements -- do you think I should make a patch to get this into the standard library? New issue or add it to 9285? Another feature that I wanted and would be useful for a lot of folks, I think, would be some way to fetch the results as proper Python objects, rather than as a file/string. Maybe a Profile.get_results() function that returns a ProfileResult namedtuple which has total time and number of calls etc, as well as a list of ProfileEntry namedtuples that have the data for each function. Thoughts on that (before any bike-shedding on the exact API)? Currently folks who need this data have to use undocumented parts of Profile like .stats and .fcn_list, for example the Pyramid debug toolbar extracts this data so it can render it in an HTML web page: https://github.com/Pylons/pyramid_debugtoolbar/blob/ed406d7f3c8581458c2e7bdf... Thanks! Ben # BELOW is my profileutils.py module """Code profiling utilities.
def foo(n): ... return sum(sum(range(i)) for i in range(n))
with Profile() as profiler: ... result = foo(5) print(profiler.format_stats(3, sort='nfl')) # doctest: +ELLIPSIS 15 function calls (10 primitive calls) in 0.000 seconds <BLANKLINE> Ordered by: name/file/line List reduced from 5 to 3 due to restriction <3> <BLANKLINE> ncalls tottime percall cumtime percall filename:lineno(function) 6/1 0.000 0.000 0.000 0.000 {built-in method builtins.sum} 6 0.000 0.000 0.000 0.000 <...>:2(<genexpr>) 1 0.000 0.000 0.000 0.000 {method 'disable' ...} <BLANKLINE> <BLANKLINE> <BLANKLINE> result 10
result = runcall(foo, 10) # doctest: +ELLIPSIS 24 function calls (14 primitive calls) in 0.000 seconds <BLANKLINE> Ordered by: standard name <BLANKLINE> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 <...>:1(foo) 11 0.000 0.000 0.000 0.000 <...>:2(<genexpr>) 11/1 0.000 0.000 0.000 0.000 {built-in method builtins.sum} 1 0.000 0.000 0.000 0.000 {method 'disable' ...} <BLANKLINE> <BLANKLINE> result 120 """
import cProfile import io import pstats VALID_SORTS = [ 'calls', 'cumtime', 'cumulative', 'file', 'filename', 'line', 'module', 'name', 'ncalls', 'nfl', 'pcalls', 'stdname', 'time', 'tottime', ] class Profile(cProfile.Profile): def format_stats(self, *restrictions, sort='stdname', strip_dirs=False): """Format stats report but return as string instead of printing to stdout. "restrictions" are as per Stats.print_stats(). Entries are sorted as per Stats.sort_stats(sort), and directories are stripped from filenames if strip_dirs is True. """ sort_keys = (sort,) if isinstance(sort, (int, str)) else sort stream = io.StringIO() stats = pstats.Stats(self, stream=stream) stats.sort_stats(*sort_keys) if strip_dirs: stats.strip_dirs() stats.print_stats(*restrictions) return stream.getvalue() # Define __enter__ and __exit__ to enable "with statement" support def __enter__(self): self.enable() return self def __exit__(self, exc_type, exc_value, exc_traceback): self.disable() def runcall(func, *args, **kwargs): """Profile running func(*args, **kwargs) and print stats to stdout.""" profiler = cProfile.Profile() result = profiler.runcall(func, *args, **kwargs) profiler.print_stats() return result if __name__ == '__main__': import doctest doctest.testmod()
I use an @profile() decorator for almost all my profiling. If you want to
profile function foo you just decorate it and re-run the program.
With a with block you have to find the places where foo is called and put
with statements around the calls.
I think both approaches are equally valid and useful.
On 3 November 2016 at 07:30, Ben Hoyt
Hi folks,
Every time I do some Python profiling (with cProfile) the API feels kinda baroque, and I have to write ~10 line helper functions to do what I want. For example, if I want the return value of running some function and the profiling output as a string (e.g., to send as part of a web response), I have to do something like this:
import cProfile import io import pstats
def profile(func, *args, **kwargs): profiler = cProfile.Profile() result = profiler.runcall(func, *args, **kwargs) stream = io.StringIO() stats = pstats.Stats(profiler, stream=stream) stats.sort_stats('cumulative') stats.print_stats(strip_dirs=True) return (result, stream.getvalue())
Something like that might be a useful function in its own right, but I took a look at the current API, and also an open issue that addresses some of this (https://bugs.python.org/issue9285 last touched almost 4 years ago), and came up with the following:
1) Add a format_stats() function to Profile to give the profiling results as a string (kind of parallel to format_* vs print_* functions in the "traceback" module). Signature would format_stats(self, *restrictions, sort='stdname', strip_dirs=False).
2) Add __enter__ and __exit__ to Profile so you can use it in a "with" statement.
3) Add a top-level runcall() function to the cProfile (and profile) modules. This isn't particularly useful for me, but it'd make the module easier to use from the command line, and it's one of the API improvements over at issue 9285.
Other things in issue 9285 that I don't think are a good idea: the separate runblock() context manager (there should be only one way to do it, and I think "with Profile()" is great), and the @profile decorator (I really don't see the use of this -- you don't always want to profile a function when calling it, only it certain contexts).
My code implementing the above three things as a separate module (subclassing Profile) is copied below.
I think the above additions are non-controversial improvements -- do you think I should make a patch to get this into the standard library? New issue or add it to 9285?
Another feature that I wanted and would be useful for a lot of folks, I think, would be some way to fetch the results as proper Python objects, rather than as a file/string. Maybe a Profile.get_results() function that returns a ProfileResult namedtuple which has total time and number of calls etc, as well as a list of ProfileEntry namedtuples that have the data for each function. Thoughts on that (before any bike-shedding on the exact API)?
Currently folks who need this data have to use undocumented parts of Profile like .stats and .fcn_list, for example the Pyramid debug toolbar extracts this data so it can render it in an HTML web page: https://github.com/Pylons/pyramid_debugtoolbar/blob/ ed406d7f3c8581458c2e7bdf25e11e9ee8e3d489/pyramid_debugtoolbar/panels/ performance.py#L93
Thanks! Ben
# BELOW is my profileutils.py module
"""Code profiling utilities.
def foo(n): ... return sum(sum(range(i)) for i in range(n))
with Profile() as profiler: ... result = foo(5) print(profiler.format_stats(3, sort='nfl')) # doctest: +ELLIPSIS 15 function calls (10 primitive calls) in 0.000 seconds <BLANKLINE> Ordered by: name/file/line List reduced from 5 to 3 due to restriction <3> <BLANKLINE> ncalls tottime percall cumtime percall filename:lineno(function) 6/1 0.000 0.000 0.000 0.000 {built-in method builtins.sum} 6 0.000 0.000 0.000 0.000 <...>:2(<genexpr>) 1 0.000 0.000 0.000 0.000 {method 'disable' ...} <BLANKLINE> <BLANKLINE> <BLANKLINE> result 10
result = runcall(foo, 10) # doctest: +ELLIPSIS 24 function calls (14 primitive calls) in 0.000 seconds <BLANKLINE> Ordered by: standard name <BLANKLINE> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 <...>:1(foo) 11 0.000 0.000 0.000 0.000 <...>:2(<genexpr>) 11/1 0.000 0.000 0.000 0.000 {built-in method builtins.sum} 1 0.000 0.000 0.000 0.000 {method 'disable' ...} <BLANKLINE> <BLANKLINE> result 120 """
import cProfile import io import pstats
VALID_SORTS = [ 'calls', 'cumtime', 'cumulative', 'file', 'filename', 'line', 'module', 'name', 'ncalls', 'nfl', 'pcalls', 'stdname', 'time', 'tottime', ]
class Profile(cProfile.Profile): def format_stats(self, *restrictions, sort='stdname', strip_dirs=False): """Format stats report but return as string instead of printing to stdout. "restrictions" are as per Stats.print_stats(). Entries are sorted as per Stats.sort_stats(sort), and directories are stripped from filenames if strip_dirs is True. """ sort_keys = (sort,) if isinstance(sort, (int, str)) else sort stream = io.StringIO() stats = pstats.Stats(self, stream=stream) stats.sort_stats(*sort_keys) if strip_dirs: stats.strip_dirs() stats.print_stats(*restrictions) return stream.getvalue()
# Define __enter__ and __exit__ to enable "with statement" support
def __enter__(self): self.enable() return self
def __exit__(self, exc_type, exc_value, exc_traceback): self.disable()
def runcall(func, *args, **kwargs): """Profile running func(*args, **kwargs) and print stats to stdout.""" profiler = cProfile.Profile() result = profiler.runcall(func, *args, **kwargs) profiler.print_stats() return result
if __name__ == '__main__': import doctest doctest.testmod()
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- *Tim Mitchell* Senior Software Developer *phone:* +64 3 961 1031 ext. 217 *email:* tim.mitchell@leapfrog3d.com *skype: *tim.mitchell.leapfrog3d *address:* 41 Leslie Hills Drive, Riccarton, Christchurch, 8011, New Zealand www.leapfrog3d.com An ARANZ Geo Limited Product http://www.aranzgeo.com http://www.facebook.com/leapfrog3d http://https://www.linkedin.com/company/aranz-geo-limited ------------------------------
Okay, got it, that sounds fair enough. With your @profile decorator how do
you tell it when and where to print the output? Can you post the source for
your decorator?
On Wed, Nov 2, 2016 at 4:52 PM, Tim Mitchell
I use an @profile() decorator for almost all my profiling. If you want to profile function foo you just decorate it and re-run the program. With a with block you have to find the places where foo is called and put with statements around the calls. I think both approaches are equally valid and useful.
On 3 November 2016 at 07:30, Ben Hoyt
wrote: Hi folks,
Every time I do some Python profiling (with cProfile) the API feels kinda baroque, and I have to write ~10 line helper functions to do what I want. For example, if I want the return value of running some function and the profiling output as a string (e.g., to send as part of a web response), I have to do something like this:
import cProfile import io import pstats
def profile(func, *args, **kwargs): profiler = cProfile.Profile() result = profiler.runcall(func, *args, **kwargs) stream = io.StringIO() stats = pstats.Stats(profiler, stream=stream) stats.sort_stats('cumulative') stats.print_stats(strip_dirs=True) return (result, stream.getvalue())
Something like that might be a useful function in its own right, but I took a look at the current API, and also an open issue that addresses some of this (https://bugs.python.org/issue9285 last touched almost 4 years ago), and came up with the following:
1) Add a format_stats() function to Profile to give the profiling results as a string (kind of parallel to format_* vs print_* functions in the "traceback" module). Signature would format_stats(self, *restrictions, sort='stdname', strip_dirs=False).
2) Add __enter__ and __exit__ to Profile so you can use it in a "with" statement.
3) Add a top-level runcall() function to the cProfile (and profile) modules. This isn't particularly useful for me, but it'd make the module easier to use from the command line, and it's one of the API improvements over at issue 9285.
Other things in issue 9285 that I don't think are a good idea: the separate runblock() context manager (there should be only one way to do it, and I think "with Profile()" is great), and the @profile decorator (I really don't see the use of this -- you don't always want to profile a function when calling it, only it certain contexts).
My code implementing the above three things as a separate module (subclassing Profile) is copied below.
I think the above additions are non-controversial improvements -- do you think I should make a patch to get this into the standard library? New issue or add it to 9285?
Another feature that I wanted and would be useful for a lot of folks, I think, would be some way to fetch the results as proper Python objects, rather than as a file/string. Maybe a Profile.get_results() function that returns a ProfileResult namedtuple which has total time and number of calls etc, as well as a list of ProfileEntry namedtuples that have the data for each function. Thoughts on that (before any bike-shedding on the exact API)?
Currently folks who need this data have to use undocumented parts of Profile like .stats and .fcn_list, for example the Pyramid debug toolbar extracts this data so it can render it in an HTML web page: https://github.com/Pylons/pyramid_debugtoolbar/blob/ed406d7f 3c8581458c2e7bdf25e11e9ee8e3d489/pyramid_debugtoolbar/ panels/performance.py#L93
Thanks! Ben
# BELOW is my profileutils.py module
"""Code profiling utilities.
def foo(n): ... return sum(sum(range(i)) for i in range(n))
with Profile() as profiler: ... result = foo(5) print(profiler.format_stats(3, sort='nfl')) # doctest: +ELLIPSIS 15 function calls (10 primitive calls) in 0.000 seconds <BLANKLINE> Ordered by: name/file/line List reduced from 5 to 3 due to restriction <3> <BLANKLINE> ncalls tottime percall cumtime percall filename:lineno(function) 6/1 0.000 0.000 0.000 0.000 {built-in method builtins.sum} 6 0.000 0.000 0.000 0.000 <...>:2(<genexpr>) 1 0.000 0.000 0.000 0.000 {method 'disable' ...} <BLANKLINE> <BLANKLINE> <BLANKLINE> result 10
result = runcall(foo, 10) # doctest: +ELLIPSIS 24 function calls (14 primitive calls) in 0.000 seconds <BLANKLINE> Ordered by: standard name <BLANKLINE> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.000 0.000 <...>:1(foo) 11 0.000 0.000 0.000 0.000 <...>:2(<genexpr>) 11/1 0.000 0.000 0.000 0.000 {built-in method builtins.sum} 1 0.000 0.000 0.000 0.000 {method 'disable' ...} <BLANKLINE> <BLANKLINE> result 120 """
import cProfile import io import pstats
VALID_SORTS = [ 'calls', 'cumtime', 'cumulative', 'file', 'filename', 'line', 'module', 'name', 'ncalls', 'nfl', 'pcalls', 'stdname', 'time', 'tottime', ]
class Profile(cProfile.Profile): def format_stats(self, *restrictions, sort='stdname', strip_dirs=False): """Format stats report but return as string instead of printing to stdout. "restrictions" are as per Stats.print_stats(). Entries are sorted as per Stats.sort_stats(sort), and directories are stripped from filenames if strip_dirs is True. """ sort_keys = (sort,) if isinstance(sort, (int, str)) else sort stream = io.StringIO() stats = pstats.Stats(self, stream=stream) stats.sort_stats(*sort_keys) if strip_dirs: stats.strip_dirs() stats.print_stats(*restrictions) return stream.getvalue()
# Define __enter__ and __exit__ to enable "with statement" support
def __enter__(self): self.enable() return self
def __exit__(self, exc_type, exc_value, exc_traceback): self.disable()
def runcall(func, *args, **kwargs): """Profile running func(*args, **kwargs) and print stats to stdout.""" profiler = cProfile.Profile() result = profiler.runcall(func, *args, **kwargs) profiler.print_stats() return result
if __name__ == '__main__': import doctest doctest.testmod()
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
*Tim Mitchell* Senior Software Developer
*phone:* +64 3 961 1031 ext. 217 *email:* tim.mitchell@leapfrog3d.com *skype: *tim.mitchell.leapfrog3d *address:* 41 Leslie Hills Drive, Riccarton, Christchurch, 8011, New Zealand www.leapfrog3d.com
An ARANZ Geo Limited Product http://www.aranzgeo.com
http://www.facebook.com/leapfrog3d http://https://www.linkedin.com/company/aranz-geo-limited ------------------------------
Hi Ben,
Mostly I just print to stdout, I imagine more flexibility would be needed
in general.
This is for python 2.7 - don't know if it works for 3.
def profile(sort='time', restriction=(), callers=None, callees=None,
filename=None):
def _profileDecorator(func):
"print profile stats for decorated function"
def wrapper(*args, **kwargs):
print 'Profile for:', func.__name__
prof = cProfile.Profile()
result = prof.runcall(func, *args, **kwargs)
_, statsFileName = tempfile.mkstemp()
prof.dump_stats(statsFileName)
if filename is None:
stats = pstats.Stats(statsFileName)
else:
stats = pstats.Stats(statsFileName, stream=open(filename, 'w'))
if isinstance(sort, basestring):
stats.sort_stats(sort)
else:
stats.sort_stats(*sort)
if isinstance(restriction, (tuple, list)):
stats.print_stats(*restriction)
else:
stats.print_stats(restriction)
if callers is not None:
if isinstance(callers, basestring):
stats.print_callers(callers)
else:
stats.print_callers(*callers)
if callees is not None:
if isinstance(callees, basestring):
stats.print_callees(callees)
else:
stats.print_callees(*callees)
return result
return wrapper
return _profileDecorator
Cheers
Tim
On 3 November 2016 at 09:58, Ben Hoyt
Okay, got it, that sounds fair enough. With your @profile decorator how do you tell it when and where to print the output? Can you post the source for your decorator?
On Wed, Nov 2, 2016 at 4:52 PM, Tim Mitchell
wrote: I use an @profile() decorator for almost all my profiling. If you want to profile function foo you just decorate it and re-run the program. With a with block you have to find the places where foo is called and put with statements around the calls. I think both approaches are equally valid and useful.
conduct/ http://python.org/psf/codeofconduct/
Long ago I posted a patch for this (decorator + context manager) but I
bumped into a weird error I wasn't able to fix (see last comment):
http://bugs.python.org/issue9285
On Wed, Nov 2, 2016 at 10:45 PM, Tim Mitchell
Hi Ben,
Mostly I just print to stdout, I imagine more flexibility would be needed in general.
This is for python 2.7 - don't know if it works for 3.
def profile(sort='time', restriction=(), callers=None, callees=None, filename=None): def _profileDecorator(func): "print profile stats for decorated function" def wrapper(*args, **kwargs): print 'Profile for:', func.__name__
prof = cProfile.Profile() result = prof.runcall(func, *args, **kwargs) _, statsFileName = tempfile.mkstemp() prof.dump_stats(statsFileName) if filename is None: stats = pstats.Stats(statsFileName) else: stats = pstats.Stats(statsFileName, stream=open(filename, 'w')) if isinstance(sort, basestring): stats.sort_stats(sort) else: stats.sort_stats(*sort) if isinstance(restriction, (tuple, list)): stats.print_stats(*restriction) else: stats.print_stats(restriction) if callers is not None: if isinstance(callers, basestring): stats.print_callers(callers) else: stats.print_callers(*callers) if callees is not None: if isinstance(callees, basestring): stats.print_callees(callees) else: stats.print_callees(*callees) return result return wrapper return _profileDecorator
Cheers
Tim
On 3 November 2016 at 09:58, Ben Hoyt
wrote: Okay, got it, that sounds fair enough. With your @profile decorator how do you tell it when and where to print the output? Can you post the source for your decorator?
On Wed, Nov 2, 2016 at 4:52 PM, Tim Mitchell
wrote:
I use an @profile() decorator for almost all my profiling. If you want to profile function foo you just decorate it and re-run the program. With a with block you have to find the places where foo is called and put with statements around the calls. I think both approaches are equally valid and useful.
conduct/ http://python.org/psf/codeofconduct/
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Giampaolo - http://grodola.blogspot.com
participants (3)
-
Ben Hoyt
-
Giampaolo Rodola'
-
Tim Mitchell