Mailman 3 Small improvements to the profile/cProfile API - Python-ideas

2 Nov 2016

      Hi folks,

Every time I do some Python profiling (with cProfile) the API feels kinda
baroque, and I have to write ~10 line helper functions to do what I want.
For example, if I want the return value of running some function and the
profiling output as a string (e.g., to send as part of a web response), I
have to do something like this:

    import cProfile
    import io
    import pstats

    def profile(func, *args, **kwargs):
        profiler = cProfile.Profile()
        result = profiler.runcall(func, *args, **kwargs)
        stream = io.StringIO()
        stats = pstats.Stats(profiler, stream=stream)
        stats.sort_stats('cumulative')
        stats.print_stats(strip_dirs=True)
        return (result, stream.getvalue())

Something like that might be a useful function in its own right, but I took
a look at the current API, and also an open issue that addresses some of
this (https://bugs.python.org/issue9285 last touched almost 4 years ago),
and came up with the following:

1) Add a format_stats() function to Profile to give the profiling results
as a string (kind of parallel to format_* vs print_* functions in the
"traceback" module). Signature would format_stats(self, *restrictions,
sort='stdname', strip_dirs=False).

2) Add __enter__ and __exit__ to Profile so you can use it in a "with"
statement.

3) Add a top-level runcall() function to the cProfile (and profile)
modules. This isn't particularly useful for me, but it'd make the module
easier to use from the command line, and it's one of the API improvements
over at issue 9285.

Other things in issue 9285 that I don't think are a good idea: the separate
runblock() context manager (there should be only one way to do it, and I
think "with Profile()" is great), and the @profile decorator (I really
don't see the use of this -- you don't always want to profile a function
when calling it, only it certain contexts).

My code implementing the above three things as a separate module
(subclassing Profile) is copied below.

I think the above additions are non-controversial improvements -- do you
think I should make a patch to get this into the standard library? New
issue or add it to 9285?

Another feature that I wanted and would be useful for a lot of folks, I
think, would be some way to fetch the results as proper Python objects,
rather than as a file/string. Maybe a Profile.get_results() function that
returns a ProfileResult namedtuple which has total time and number of calls
etc, as well as a list of ProfileEntry namedtuples that have the data for
each function. Thoughts on that (before any bike-shedding on the exact API)?

Currently folks who need this data have to use undocumented parts of
Profile like .stats and .fcn_list, for example the Pyramid debug toolbar
extracts this data so it can render it in an HTML web page:
https://github.com/Pylons/pyramid_debugtoolbar/blob/ed406d7f3c8581458c2e7bdf...

Thanks!
Ben

# BELOW is my profileutils.py module

"""Code profiling utilities.
...
...
...
def foo(n):
...     return sum(sum(range(i)) for i in range(n))
...
...
...
with Profile() as profiler:
...     result = foo(5)
print(profiler.format_stats(3, sort='nfl'))  # doctest: +ELLIPSIS
         15 function calls (10 primitive calls) in 0.000 seconds
<BLANKLINE>
   Ordered by: name/file/line
   List reduced from 5 to 3 due to restriction <3>
<BLANKLINE>
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      6/1    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
        6    0.000    0.000    0.000    0.000 <...>:2(<genexpr>)
        1    0.000    0.000    0.000    0.000 {method 'disable' ...}
<BLANKLINE>
<BLANKLINE>
<BLANKLINE>
result
10
...
...
...
result = runcall(foo, 10)  # doctest: +ELLIPSIS
         24 function calls (14 primitive calls) in 0.000 seconds
<BLANKLINE>
   Ordered by: standard name
<BLANKLINE>
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <...>:1(foo)
       11    0.000    0.000    0.000    0.000 <...>:2(<genexpr>)
     11/1    0.000    0.000    0.000    0.000 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' ...}
<BLANKLINE>
<BLANKLINE>
result
120
"""
import cProfile
import io
import pstats

VALID_SORTS = [
    'calls', 'cumtime', 'cumulative', 'file', 'filename',
    'line', 'module', 'name', 'ncalls', 'nfl',
    'pcalls', 'stdname', 'time', 'tottime',
]

class Profile(cProfile.Profile):
    def format_stats(self, *restrictions, sort='stdname', strip_dirs=False):
        """Format stats report but return as string instead of printing to
        stdout. "restrictions" are as per Stats.print_stats(). Entries are
        sorted as per Stats.sort_stats(sort), and directories are stripped
        from filenames if strip_dirs is True.
        """
        sort_keys = (sort,) if isinstance(sort, (int, str)) else sort
        stream = io.StringIO()
        stats = pstats.Stats(self, stream=stream)
        stats.sort_stats(*sort_keys)
        if strip_dirs:
            stats.strip_dirs()
        stats.print_stats(*restrictions)
        return stream.getvalue()

    # Define __enter__ and __exit__ to enable "with statement" support

    def __enter__(self):
        self.enable()
        return self

    def __exit__(self, exc_type, exc_value, exc_traceback):
        self.disable()

def runcall(func, *args, **kwargs):
    """Profile running func(*args, **kwargs) and print stats to stdout."""
    profiler = cProfile.Profile()
    result = profiler.runcall(func, *args, **kwargs)
    profiler.print_stats()
    return result

if __name__ == '__main__':
    import doctest
    doctest.testmod()

Small improvements to the profile/cProfile API

Ben Hoyt

Tim Mitchell

Ben Hoyt

Tim Mitchell

Giampaolo Rodola'

tags

participants (3)