[Python-Dev] s/hotshot/lsprof

Armin Rigo arigo at tunes.org
Sat Nov 19 19:08:55 CET 2005


Hi!

The current Python profilers situation is a mess.

'profile.Profile' is the ages-old pure Python profiler.  At the end of a
run, it builds a dict that is inspected by 'pstats.Stats'.  It has some
recent support for profiling C calls, which however make it crash in
some cases [1].  And of course it's slow (makes a run take about 10x
longer).

'hotshot', new from 2.2, is quite faster (reportedly, only 30% added
overhead).  The log file is then loaded and turned into an instance of
the same 'pstats.Stats'.  This loading takes ages.  The reason is that
the log file only records events, and loading is done by instantiating a
'profile.Profile' and sending it all the events.  In other words, it
takes exactly as long as the time it spared in the first place!
Moreover, for some reasons, the results given by hotshot seem sometimes
quite wrong.  (I don't understand why, but I've seen it myself, and it's
been reported by various people, e.g. [2].)  'hotshot' doesn't know
about C calls, but it can log line events, although this information is
lost(!) in the final conversion to a 'pstats.Stats'.

'lsprof' is a third profiler by Brett Rosen and Ted Czotter, posted on
SF in June [2].  Michael Hudson and me did some minor clean-ups and
improvements on it, and it seems to be quite useful.  It is, for
example, the only of the three profilers that managed to give sensible
information about the PyPy translation process without crashing,
allowing us to accelerate it from over 30 to under 20 minutes.  The SF
patch contains a more detailed account on the reasons for writing
'lsprof'.  The current version [3] does not support C calls nor line
events.  It has its own simple interface, which is not compatible with
any of the other two profilers.  However, unlike the other two
profilers, it can record detailed stats about children, which I found
quite useful (e.g. how much take is spent in a function when it is
called by another specific function).

Therefore, I think it would be a great idea to add 'lsprof' to the
standard library.  Unless there are objections, it seems that the best
plan is to keep 'profile.py' as a pure Python implementation and replace
'hotshot' with 'lsprof'.  Indeed, I don't see any obvious advantage that
'hotshot' has over 'lsprof', and I certainly see more than one downside.
Maybe someone has a use for (and undocumented ways to fish for) line
events generated by hotshot.  Well, there is a script [4] to convert
hotshot log files to some format that a KDE tool [5] can display.  (It
even looks like hotshot files were designed with this in mind.)  Given
that the people doing that can still compile 'hotshot' as a separate
extension module, it doesn't strike me as a particularly good reason to
keep Yet Another Profiler in the standard library.

So here is my plan:

Unify a bit more the interfaces of the pure Python and the C profilers.
This also means that 'lsprof' should be made to use a pstats-compatible
log format.  The 'pstats' documentation specifically says that the file
format can change: that would give 'lsprof' a place to store its
detailed children stats.

Then we can provide a dummy 'hotshot.py' for compatibility, remove its
documentation, and provide documentation for 'lsprof'.

If anyone feels like this is a bad idea, please speak up.


A bientot,

Armin


[1] https://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=1117670

[2] http://sourceforge.net/tracker/?group_id=5470&atid=305470&func=detail&aid=1212837

[3] http://codespeak.net/svn/user/arigo/hack/misc/lsprof (Subversion)

[4] http://mail.python.org/pipermail/python-list/2003-September/183887.html

[5] http://kcachegrind.sourceforge.net/cgi-bin/show.cgi


More information about the Python-Dev mailing list