[Python-Dev] PyDict_SetItem hook

Collin Winter collinw at gmail.com
Fri Apr 3 19:18:04 CEST 2009


On Fri, Apr 3, 2009 at 9:43 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Thomas Wouters <thomas <at> python.org> writes:
>>
>> Really? Have you tried it? I get at least 5% noise between runs without any
> changes. I have gotten results that include *negative* run times.
>
> That's an implementation problem, not an issue with the tests themselves.
> Perhaps a better timing mechanism could be inspired from the timeit module.
> Perhaps the default numbers of iterations should be higher (many subtests run
> in less than 100ms on a modern CPU, which might be too low for accurate
> measurement). Perhaps the so-called "calibration" should just be disabled.
> etc.
>
>> The tests in PyBench are not micro-benchmarks (they do way too much for
> that),
>
> Then I wonder what you call a micro-benchmark. Should it involve direct calls
> to
> low-level C API functions?

I agree that a suite of microbenchmarks is supremely useful: I would
very much like to be able to isolate, say, raise statement
performance. PyBench suffers from implementation defects that in its
current incarnation make it unsuitable for this, though:
- It does not effectively isolate component performance as it claims.
When I was working on a change to BINARY_MODULO to make string
formatting faster, PyBench would report that floating point math got
slower, or that generator yields got slower. There is a lot of random
noise in the results.
- We have observed overall performance swings of 10-15% between runs
on the same machine, using the same Python binary. Using the same
binary on the same unloaded machine should give as close an answer to
0% as possible.
- I wish PyBench actually did more isolation.
Call.py:ComplexPythonFunctionCalls is on my mind right now; I wish it
didn't put keyword arguments and **kwargs in the same microbenchmark.
- In experimenting with gcc 4.4's FDO support, I produced a training
load that resulted in a 15-30% performance improvement (depending on
benchmark) across all benchmarks. Using this trained binary, PyBench
slowed down by 10%.
- I would like to see PyBench incorporate better statistics for
indicating the significance of the observed performance difference.

I don't believe that these are insurmountable problems, though. A
great contribution to Python performance work would be an improved
version of PyBench that corrects these problems and offers more
precise measurements. Is that something you might be interested in
contributing to? As performance moves more into the wider
consciousness, having good tools will become increasingly important.

Thanks,
Collin


More information about the Python-Dev mailing list