[Python-Dev] explanations for more pybench slowdowns

Fri, 18 May 2001 17:07:37 -0400 (EDT)

I did some profiles of more of the pybench slowdowns this afternoon
and found a few causes for several problem benchmarks.

I just made a couple of small changes for BuiltinFunctionCalls.  The
problem here is that PyCFunction calls were optimized for flags == 0
and not flags == METH_VARARGS, which is more common.

The scary thing about BuiltinFunctinoCalls is that the profiler shows
it spending almost 30% of its time in PyArg_ParseTuple().  It
certainly is a shame that we have this complicated, slow run-time
parsing mechanism to deal with a static property of the code, namely
how many arguments it takes and whether their types are.

A few of the other tests, SimpleComplexArithmetic and
CreateStringsWithConcat, are slower because of the new coercion
logic.  I didn't spend much time on SimpleComplexArithmetic, but I did
look at CreateStringsWithConcat in some detail.  The basic problem is
that "ab" + "cd" gets compiled to BINARY_ADD, which in turn calls
PyNumber_Add("ab", "cd").  This function tries all sorts of different
ways to coerce the strings into addable numbers before giving up and
trying sequence concat.

It looks like the new coercion rules have optimized number ops at the
expense of string ops.  If you're writing programs with lots of
numbers, you probably think that's peachy.  If you're parsing HTML,
perhaps you don't :-).

I looked at the test suite to see how often it is called with
non-number arguments.  The answer is 77% of the time, but almost all
of those calls are from test_unicodedata.  If that one test is
excluded, the majority of the calls (~90%) are with numbers.  But the
majority of those calls just come from a few tests -- test_pow,
test_long, test_mutants, test_strftime.

If I were to do something about the coercions, I would see if there
was a way to quickly determine that PyNumber_Add() ain't gonna have
any luck.  Then we could bail to things like string_concat more
quickly.

I also looked at SmallLists.  It seems that the only significant
change since 1.5.2 is the garbage collection.  This tests spends a lot
more time deallocating lists than it used to, and the only change I
see in the code is the GC.  I assume, but haven't checked, that the
story is similar for SmallTuples.

So the primary things that have slowed down since 1.5.2 seem to be:
comparisons, coercion, and memory management for containers.  These
also seem to be the things that have improved the most in terms of
features, completeness, etc.  Looks like we need to revisit them and
sort out the performance issues.

Jeremy