[Numpy-discussion] FIY: a (new ?) practical profiling tool on linux
David Cournapeau
cournape at gmail.com
Thu Jan 7 22:12:20 EST 2010
Hi,
I don't know if many people are aware of it, but I have recently
discovered perf, a tool available from the kernel sources. It is
extremely simple to use, and very useful when looking at numpy/scipy
perf issues in compiled code. For example, I can get this kind of
results for looking at the numpy neighborhood iterator performance in
one simple command, without special compilation flags:
44.69% python
/home/david/local/stow/scipy.git/lib/python2.6/site-packages/scipy/signal/sigtools.so
[.] _imp_correlate_nd_double
39.47% python
/home/david/local/stow/numpy-1.4.0/lib/python2.6/site-packages/numpy/core/multiarray.so
[.] get_ptr_constant
9.98% python
/home/david/local/stow/numpy-1.4.0/lib/python2.6/site-packages/numpy/core/multiarray.so
[.] get_ptr_simple
0.65% python /usr/bin/python2.6
[.]
0x0000000012b8a0
0.40% python /usr/bin/python2.6
[.]
0x000000000a6662
0.37% python /usr/bin/python2.6
[.]
0x0000000004c10d
0.32% python /usr/bin/python2.6
[.]
PyEval_EvalFrameEx
0.15% python [kernel]
[k] __d_lookup
0.14% python /lib/libc-2.10.1.so
[.] _int_malloc
0.12% python /usr/bin/python2.6
[.]
0x0000000004f90e
0.10% python [kernel]
[k]
__link_path_walk
0.09% python /usr/bin/python2.6
[.]
PyObject_Malloc
0.09% python /lib/ld-2.10.1.so
[.] do_lookup_x
0.09% python /lib/libc-2.10.1.so
[.] __GI_memcpy
0.08% python [kernel]
[k]
__ticket_spin_lock
0.07% python /usr/bin/python2.6
[.]
PyParser_AddToken
And even cooler, annotated sources:
------------------------------------------------
Percent | Source code & Disassembly of multiarray.so
------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 000000000001d8a0 <get_ptr_constant>:
: _coordinates[c] = bd;
:
: /* set the dataptr from its current coordinates */
: static char*
: get_ptr_constant(PyArrayIterObject* _iter, npy_intp
*coordinates)
: {
15.69 : 1d8a0: 48 81 ec 08 01 00 00 sub $0x108,%rsp
: int i;
: npy_intp bd, _coordinates[NPY_MAXDIMS];
: PyArrayNeighborhoodIterObject *niter =
(PyArrayNeighborhoodIterObject*)_iter;
: PyArrayIterObject *p = niter->_internal_iter;
:
: for(i = 0; i < niter->nd; ++i) {
0.02 : 1d8a7: 48 83 bf 48 0a 00 00 cmpq $0x0,0xa48(%rdi)
0.00 : 1d8ae: 00
: get_ptr_constant(PyArrayIterObject* _iter, npy_intp
*coordinates)
: {
: int i;
: npy_intp bd, _coordinates[NPY_MAXDIMS];
: PyArrayNeighborhoodIterObject *niter =
(PyArrayNeighborhoodIterObject*)_iter;
: PyArrayIterObject *p = niter->_internal_iter;
0.01 : 1d8af: 48 8b 87 50 0b 00 00 mov 0xb50(%rdi),%rax
:
: for(i = 0; i < niter->nd; ++i) {
7.92 : 1d8b6: 7e 64 jle 1d91c
<get_ptr_constant+0x7c>
: _INF_SET_PTR(i)
0.01 : 1d8b8: 48 8b 0e mov (%rsi),%rcx
0.00 : 1d8bb: 48 03 48 28 add 0x28(%rax),%rcx
0.03 : 1d8bf: 48 3b 88 40 07 00 00 cmp 0x740(%rax),%rcx
7.97 : 1d8c6: 7c 68 jl 1d930
<get_ptr_constant+0x90>
0.02 : 1d8c8: 45 31 c9 xor %r9d,%r9d
0.00 : 1d8cb: 31 d2 xor %edx,%edx
0.00 : 1d8cd: 48 3b 88 48 07 00 00 cmp 0x748(%rax),%rcx
7.75 : 1d8d4: 7e 32 jle 1d908
<get_ptr_constant+0x68>
0.00 : 1d8d6: eb 58 jmp 1d930
<get_ptr_constant+0x90>
0.00 : 1d8d8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
0.00 : 1d8df: 00
7.68 : 1d8e0: 4c 8d 42 74 lea 0x74(%rdx),%r8
0.00 : 1d8e4: 48 8b 0c d6 mov
(%rsi,%rdx,8),%rcx
0.00 : 1d8e8: 48 03 4c d0 28 add
0x28(%rax,%rdx,8),%rcx
0.00 : 1d8ed: 49 c1 e0 04 shl $0x4,%r8
7.89 : 1d8f1: 49 3b 0c 00 cmp (%r8,%rax,1),%rcx
0.00 : 1d8f5: 7c 39 jl 1d930
<get_ptr_constant+0x90>
0.01 : 1d8f7: 49 89 d0 mov %rdx,%r8
0.11 : 1d8fa: 49 c1 e0 04 shl $0x4,%r8
7.18 : 1d8fe: 4a 3b 8c 00 48 07 00 cmp
0x748(%rax,%r8,1),%rcx
0.00 : 1d905: 00
0.09 : 1d906: 7f 28 jg 1d930
<get_ptr_constant+0x90>
: int i;
: npy_intp bd, _coordinates[NPY_MAXDIMS];
: PyArrayNeighborhoodIterObject *niter =
(PyArrayNeighborhoodIterObject*)_iter;
: PyArrayIterObject *p = niter->_internal_iter;
:
It works for C and Fortran, BTW,
cheers,
David
More information about the NumPy-Discussion
mailing list