On 2018-09-13, Antoine Pitrou wrote:
[...] it's not difficult to imagine that adding a conditional branch in the critical paths of Py_INCREF() and Py_DECREF() would significantly slow down Python as well.
I finished the benchmarking last night. Hopefully I didn't mess it up as it takes a long time. Making Py_TYPE(), Py_INCREF(), Py_DECREF() into inline functions and adding a conditional branch to check for a tag costs roughly 8%. See below. That's worse than I hoped but not as bad as I feared.
I still hope that actually using tagged fixed ints could recover that 8%. Anything not in the small int cache is doing heap allocation and that must be pretty expensive. Obviously real applications would have to get faster, not just int heavy micro-benchmarks.
So, if you want tagged pointers without slowing everything other than small integer arithmetic, you probably need to ditch reference counting as well. And then, perhaps you should start by ditching reference counting, because that's much more ambitious and complicated than implementing tagged pointers ;-)
I think no one has been hoping to get rid of reference counting in Python for longer than me. My original cycle GC patch started as experiments with mark-and-sweep collection. About 20 years have gone past but maybe we will get there yet.
BTW, the logic for Py_INCREF was:
#define IS_TAGGED(op) ((uint64_t)op & 1)
inline void _Py_INCREF(PyObject *op) { if (!IS_TAGGED(op)) { op->ob_refcnt++; } }
pyperformance results follow.
2to3: Mean +- std dev: [base] 307 ms +- 5 ms -> [funcs] 320 ms +- 2 ms: 1.04x slower (+4%) chameleon: Mean +- std dev: [base] 9.48 ms +- 0.15 ms -> [funcs] 10.2 ms +- 0.6 ms: 1.08x slower (+8%) chaos: Mean +- std dev: [base] 108 ms +- 1 ms -> [funcs] 119 ms +- 1 ms: 1.09x slower (+9%) crypto_pyaes: Mean +- std dev: [base] 112 ms +- 1 ms -> [funcs] 121 ms +- 4 ms: 1.08x slower (+8%) deltablue: Mean +- std dev: [base] 7.17 ms +- 0.22 ms -> [funcs] 7.78 ms +- 0.51 ms: 1.08x slower (+8%) django_template: Mean +- std dev: [base] 122 ms +- 3 ms -> [funcs] 130 ms +- 5 ms: 1.07x slower (+7%) dulwich_log: Mean +- std dev: [base] 76.8 ms +- 0.8 ms -> [funcs] 78.5 ms +- 1.0 ms: 1.02x slower (+2%) fannkuch: Mean +- std dev: [base] 460 ms +- 7 ms -> [funcs] 501 ms +- 2 ms: 1.09x slower (+9%) float: Mean +- std dev: [base] 111 ms +- 2 ms -> [funcs] 121 ms +- 1 ms: 1.09x slower (+9%) genshi_text: Mean +- std dev: [base] 29.3 ms +- 0.5 ms -> [funcs] 30.5 ms +- 1.3 ms: 1.04x slower (+4%) genshi_xml: Mean +- std dev: [base] 62.7 ms +- 0.9 ms -> [funcs] 66.7 ms +- 2.4 ms: 1.06x slower (+6%) go: Mean +- std dev: [base] 247 ms +- 3 ms -> [funcs] 265 ms +- 3 ms: 1.07x slower (+7%) hexiom: Mean +- std dev: [base] 9.93 ms +- 0.58 ms -> [funcs] 10.8 ms +- 0.1 ms: 1.08x slower (+8%) html5lib: Mean +- std dev: [base] 93.3 ms +- 3.2 ms -> [funcs] 97.9 ms +- 3.1 ms: 1.05x slower (+5%) json_dumps: Mean +- std dev: [base] 11.7 ms +- 0.2 ms -> [funcs] 12.4 ms +- 0.4 ms: 1.05x slower (+5%) json_loads: Mean +- std dev: [base] 25.4 us +- 1.4 us -> [funcs] 26.7 us +- 0.5 us: 1.05x slower (+5%) logging_format: Mean +- std dev: [base] 10.1 us +- 0.6 us -> [funcs] 10.6 us +- 0.2 us: 1.05x slower (+5%) logging_silent: Mean +- std dev: [base] 201 ns +- 13 ns -> [funcs] 215 ns +- 6 ns: 1.07x slower (+7%) logging_simple: Mean +- std dev: [base] 9.03 us +- 0.23 us -> [funcs] 9.60 us +- 0.27 us: 1.06x slower (+6%) mako: Mean +- std dev: [base] 17.2 ms +- 0.4 ms -> [funcs] 18.1 ms +- 0.5 ms: 1.05x slower (+5%) meteor_contest: Mean +- std dev: [base] 100 ms +- 2 ms -> [funcs] 104 ms +- 2 ms: 1.04x slower (+4%) nbody: Mean +- std dev: [base] 119 ms +- 4 ms -> [funcs] 134 ms +- 6 ms: 1.12x slower (+12%) nqueens: Mean +- std dev: [base] 94.3 ms +- 1.1 ms -> [funcs] 102 ms +- 2 ms: 1.08x slower (+8%) pathlib: Mean +- std dev: [base] 19.7 ms +- 0.2 ms -> [funcs] 20.4 ms +- 0.2 ms: 1.04x slower (+4%) pickle: Mean +- std dev: [base] 9.08 us +- 0.25 us -> [funcs] 9.25 us +- 0.27 us: 1.02x slower (+2%) pickle_dict: Mean +- std dev: [base] 22.5 us +- 0.2 us -> [funcs] 20.8 us +- 1.0 us: 1.08x faster (-8%) pickle_pure_python: Mean +- std dev: [base] 466 us +- 7 us -> [funcs] 501 us +- 26 us: 1.07x slower (+7%) pidigits: Mean +- std dev: [base] 165 ms +- 1 ms -> [funcs] 170 ms +- 4 ms: 1.03x slower (+3%) python_startup: Mean +- std dev: [base] 7.40 ms +- 0.11 ms -> [funcs] 7.48 ms +- 0.06 ms: 1.01x slower (+1%) python_startup_no_site: Mean +- std dev: [base] 5.10 ms +- 0.03 ms -> [funcs] 5.18 ms +- 0.05 ms: 1.02x slower (+2%) raytrace: Mean +- std dev: [base] 487 ms +- 7 ms -> [funcs] 532 ms +- 11 ms: 1.09x slower (+9%) regex_compile: Mean +- std dev: [base] 182 ms +- 5 ms -> [funcs] 196 ms +- 6 ms: 1.08x slower (+8%) regex_dna: Mean +- std dev: [base] 154 ms +- 2 ms -> [funcs] 157 ms +- 0 ms: 1.02x slower (+2%) regex_v8: Mean +- std dev: [base] 22.1 ms +- 0.6 ms -> [funcs] 22.5 ms +- 0.4 ms: 1.02x slower (+2%) richards: Mean +- std dev: [base] 71.6 ms +- 2.6 ms -> [funcs] 76.7 ms +- 1.9 ms: 1.07x slower (+7%) scimark_fft: Mean +- std dev: [base] 316 ms +- 6 ms -> [funcs] 351 ms +- 2 ms: 1.11x slower (+11%) scimark_lu: Mean +- std dev: [base] 172 ms +- 6 ms -> [funcs] 195 ms +- 7 ms: 1.13x slower (+13%) scimark_monte_carlo: Mean +- std dev: [base] 103 ms +- 3 ms -> [funcs] 116 ms +- 5 ms: 1.13x slower (+13%) scimark_sor: Mean +- std dev: [base] 188 ms +- 6 ms -> [funcs] 209 ms +- 6 ms: 1.11x slower (+11%) scimark_sparse_mat_mult: Mean +- std dev: [base] 3.68 ms +- 0.13 ms -> [funcs] 4.23 ms +- 0.12 ms: 1.15x slower (+15%) spectral_norm: Mean +- std dev: [base] 123 ms +- 5 ms -> [funcs] 143 ms +- 2 ms: 1.16x slower (+16%) sqlalchemy_declarative: Mean +- std dev: [base] 161 ms +- 2 ms -> [funcs] 167 ms +- 4 ms: 1.04x slower (+4%) sqlalchemy_imperative: Mean +- std dev: [base] 30.5 ms +- 0.8 ms -> [funcs] 33.1 ms +- 1.3 ms: 1.09x slower (+9%) sqlite_synth: Mean +- std dev: [base] 2.90 us +- 0.10 us -> [funcs] 3.04 us +- 0.24 us: 1.05x slower (+5%) sympy_expand: Mean +- std dev: [base] 424 ms +- 5 ms -> [funcs] 454 ms +- 14 ms: 1.07x slower (+7%) sympy_integrate: Mean +- std dev: [base] 19.5 ms +- 0.1 ms -> [funcs] 20.7 ms +- 0.7 ms: 1.06x slower (+6%) sympy_sum: Mean +- std dev: [base] 90.7 ms +- 0.8 ms -> [funcs] 96.4 ms +- 3.1 ms: 1.06x slower (+6%) sympy_str: Mean +- std dev: [base] 185 ms +- 2 ms -> [funcs] 199 ms +- 6 ms: 1.08x slower (+8%) telco: Mean +- std dev: [base] 5.98 ms +- 0.21 ms -> [funcs] 6.39 ms +- 0.47 ms: 1.07x slower (+7%) tornado_http: Mean +- std dev: [base] 187 ms +- 4 ms -> [funcs] 193 ms +- 2 ms: 1.03x slower (+3%) unpack_sequence: Mean +- std dev: [base] 47.9 ns +- 1.5 ns -> [funcs] 51.4 ns +- 1.4 ns: 1.07x slower (+7%) unpickle_list: Mean +- std dev: [base] 3.73 us +- 0.04 us -> [funcs] 3.77 us +- 0.07 us: 1.01x slower (+1%) unpickle_pure_python: Mean +- std dev: [base] 371 us +- 7 us -> [funcs] 403 us +- 14 us: 1.09x slower (+9%) xml_etree_parse: Mean +- std dev: [base] 136 ms +- 3 ms -> [funcs] 142 ms +- 5 ms: 1.04x slower (+4%) xml_etree_iterparse: Mean +- std dev: [base] 95.6 ms +- 0.9 ms -> [funcs] 101 ms +- 2 ms: 1.06x slower (+6%) xml_etree_generate: Mean +- std dev: [base] 105 ms +- 4 ms -> [funcs] 112 ms +- 1 ms: 1.07x slower (+7%) xml_etree_process: Mean +- std dev: [base] 84.3 ms +- 3.6 ms -> [funcs] 89.8 ms +- 1.0 ms: 1.07x slower (+7%) Benchmark hidden because not significant (3): pickle_list, regex_effbot, unpickle