[Cython] Performance comparison with CPython for attribute/item access

Stefan Behnel stefan_ml at behnel.de
Sat Feb 16 16:39:44 EST 2019


Hi,

Raymond Hettinger wrote a micro benchmark script for comparing the
performance of basic attribute and item access patterns across Python
versions and build configurations, so I tested the initially committed
version with Cython.

https://github.com/python/cpython/blob/master/Tools/scripts/var_access_benchmark.py

Results are below, comparing Cython (master) to CPython 3.8 (master), and
also disabling all C-time optimisations via the "CYTHON_*" macros (no-opt).

Some things to note:

- Most operations in Cython are around 30-50% faster.

- C-level things like local variables are not measurable in Cython.

- Setting class variables is very slow in both CPython and Cython, probably
for the same (unknown) reason, maybe the method cache or so.
https://bugs.python.org/issue36012

- The dict version check for Python globals is worth it. Disabling it in
Cython with "-DCYTHON_USE_DICT_VERSIONS=0" slows down the lookup by 5x
(2.2ns -> 10ns).

- Disabling PyList optimisations with "-DCYTHON_USE_PYLIST_INTERNALS=0"
slows down the "list_append_pop" benchmark by 5x (21ns -> 102ns).

- The list append/pop optimisations seem to slow down non-lists
unproportionally, for deques by 3x compared to CPython. That seems worth
improving.

Stefan


CPython 3.8 (63fa1cfece)
========================

Variable and attribute read access:
   5.4 ns       read_local
   6.0 ns       read_nonlocal
  15.7 ns       read_global
  23.5 ns       read_builtin
  23.1 ns       read_classvar_from_class
  20.4 ns       read_classvar_from_instance
  31.5 ns       read_instancevar
  25.4 ns       read_instancevar_slots
  23.8 ns       read_namedtuple
  34.5 ns       read_boundmethod

Variable and attribute write access:
   6.2 ns       write_local
   6.7 ns       write_nonlocal
  19.1 ns       write_global
 113.2 ns       write_classvar
  44.6 ns       write_instancevar
  33.0 ns       write_instancevar_slots

Data structure read access:
  23.5 ns       read_list
  24.0 ns       read_deque
  25.6 ns       read_dict

Data structure write access:
  26.0 ns       write_list
  27.1 ns       write_deque
  32.0 ns       write_dict

Stack (or queue) operations:
  61.6 ns       list_append_pop
  53.9 ns       deque_append_pop

Timing loop overhead:
   0.4 ns       loop_overhead


Cython 3.0a0 (f1eaa9c1f)
========================

Variable and attribute read access:
   0.2 ns       read_local
   0.2 ns       read_nonlocal
   2.2 ns       read_global
   0.2 ns       read_builtin
  13.8 ns       read_classvar_from_class
  11.1 ns       read_classvar_from_instance
  21.3 ns       read_instancevar
  15.5 ns       read_instancevar_slots
  13.6 ns       read_namedtuple
  21.5 ns       read_boundmethod

Variable and attribute write access:
   0.2 ns       write_local
   0.1 ns       write_nonlocal
  13.0 ns       write_global
  92.9 ns       write_classvar
  29.6 ns       write_instancevar
  16.1 ns       write_instancevar_slots

Data structure read access:
   4.0 ns       read_list
   4.3 ns       read_deque
  16.5 ns       read_dict

Data structure write access:
   4.3 ns       write_list
   6.4 ns       write_deque
  21.4 ns       write_dict

Stack (or queue) operations:
  20.7 ns       list_append_pop
 155.4 ns       deque_append_pop

Timing loop overhead:
   0.1 ns       loop_overhead


Cython 3.0a0 (no-opt)
=====================

Variable and attribute read access:
   0.2 ns       read_local
   0.2 ns       read_nonlocal
  15.6 ns       read_global
   0.2 ns       read_builtin
  16.1 ns       read_classvar_from_class
  12.1 ns       read_classvar_from_instance
  21.9 ns       read_instancevar
  16.3 ns       read_instancevar_slots
  14.5 ns       read_namedtuple
  23.8 ns       read_boundmethod

Variable and attribute write access:
   0.2 ns       write_local
   0.2 ns       write_nonlocal
  14.2 ns       write_global
  99.4 ns       write_classvar
  35.0 ns       write_instancevar
  22.4 ns       write_instancevar_slots

Data structure read access:
   5.7 ns       read_list
   6.1 ns       read_deque
  21.1 ns       read_dict

Data structure write access:
   8.4 ns       write_list
   8.4 ns       write_deque
  24.0 ns       write_dict

Stack (or queue) operations:
  66.4 ns       list_append_pop
  75.1 ns       deque_append_pop

Timing loop overhead:
   0.2 ns       loop_overhead


More information about the cython-devel mailing list