Perhaps relevant for perspective:

We did some review of the pyperformance benchmarks based on how noisy they are:
https://github.com/faster-cpython/ideas/discussions/142

Note that pidigits is the noisiest -- its performance changes up to 11% for no good reason. The regex bms are also very noisy.

On Mon, Jan 3, 2022 at 10:44 PM Gregory P. Smith <greg@krypto.org> wrote:

On Sun, Jan 2, 2022 at 2:37 AM Mark Dickinson <dickinsm@gmail.com> wrote:
On Sat, Jan 1, 2022 at 9:05 PM Antoine Pitrou <antoine@python.org> wrote:
Note that ARM is merely an architecture with very diverse
implementations having quite differing performance characteristics.  [...]

Understood. I'd be happy to see timings on a Raspberry Pi 3, say. I'm not too worried about things like the RPi Pico - that seems like it would be more of a target for MicroPython than CPython.

Wikipedia thinks, and the ARM architecture manuals seem to confirm, that most 32-bit ARM instruction sets _do_ support the UMULL 32-bit-by-32-bit-to-64-bit multiply instruction. (From https://en.wikipedia.org/wiki/ARM_architecture#Arithmetic_instructions: "ARM supports 32-bit × 32-bit multiplies with either a 32-bit result or 64-bit result, though Cortex-M0 / M0+ / M1 cores don't support 64-bit results.") Division may still be problematic.

It's rather irrelevant anyways, the pi zero/one is the lowest spec arm that matters at all. Nobody is ever going to ship something worse than that capable of running CPython.

Anyways I ran actual benchmarks on a pi3. On 32-bit raspbian I build CPython 3.10 with no configure flags and with --enable-big-digits (or however that's spelled) for 30-bit digits and ran pyperformance 1.0.2 on them.

Caveat: This is not a good system to run benchmarks on.  widely variable performance (it has a tiny heatsink which never meaningfully got hot), and the storage is a random microsd card. Each full pyperformance run took 6 hours. :P

Results basically say: no notable difference.  Most do not change and the variability (look at those stddev's and how they overlap on the few things that produced a "significant" result at all) is quite high.  Things wholly unrelated to integers such as the various regex benchmarks showing up as faster demonstrate the unreliability of the numbers.  And also at how pointless caring about this fine level of detail for performance is on this platform.

```
pi@pi3$ pyperf compare_to 15bit.json 30bit.json
2to3: Mean +- std dev: [15bit] 7.88 sec +- 0.39 sec -> [30bit] 8.02 sec +- 0.36 sec: 1.02x slower
crypto_pyaes: Mean +- std dev: [15bit] 3.22 sec +- 0.34 sec -> [30bit] 3.40 sec +- 0.22 sec: 1.06x slower
fannkuch: Mean +- std dev: [15bit] 13.4 sec +- 0.5 sec -> [30bit] 13.8 sec +- 0.5 sec: 1.03x slower
pickle_list: Mean +- std dev: [15bit] 74.7 us +- 22.1 us -> [30bit] 85.7 us +- 15.5 us: 1.15x slower
pyflate: Mean +- std dev: [15bit] 19.6 sec +- 0.6 sec -> [30bit] 19.9 sec +- 0.6 sec: 1.01x slower
regex_dna: Mean +- std dev: [15bit] 2.99 sec +- 0.24 sec -> [30bit] 2.81 sec +- 0.22 sec: 1.06x faster
regex_v8: Mean +- std dev: [15bit] 520 ms +- 71 ms -> [30bit] 442 ms +- 115 ms: 1.18x faster
scimark_monte_carlo: Mean +- std dev: [15bit] 3.31 sec +- 0.24 sec -> [30bit] 3.22 sec +- 0.24 sec: 1.03x faster
scimark_sor: Mean +- std dev: [15bit] 6.42 sec +- 0.34 sec -> [30bit] 6.27 sec +- 0.33 sec: 1.03x faster
spectral_norm: Mean +- std dev: [15bit] 4.85 sec +- 0.31 sec -> [30bit] 4.74 sec +- 0.20 sec: 1.02x faster
unpack_sequence: Mean +- std dev: [15bit] 1.42 us +- 0.42 us -> [30bit] 1.60 us +- 0.33 us: 1.13x slower

Benchmark hidden because not significant (47): chameleon, chaos, deltablue, django_template, dulwich_log, float, go, hexiom, json_dumps, json_loads, logging_format, logging_silent, logging_simple, mako, meteor_contest, nbody, nqueens, pathlib, pickle, pickle_dict, pickle_pure_python, pidigits, python_startup, python_startup_no_site, raytrace, regex_compile, regex_effbot, richards, scimark_fft, scimark_lu, scimark_sparse_mat_mult, sqlalchemy_declarative, sqlalchemy_imperative, sqlite_synth, sympy_expand, sympy_integrate, sympy_sum, sympy_str, telco, tornado_http, unpickle, unpickle_list, unpickle_pure_python, xml_etree_parse, xml_etree_iterparse, xml_etree_generate, xml_etree_process
```

rerunning a mere few of those in --rigorous mode for more runs does not significantly improve the stddev so I'm not going to let that finish.

my recommendation: proceed with removing 15-bit bignum digit support.  30-bit only future with simpler better code here we come.

-gps
 

-- 
Mark

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/F53IZRZPNAKB4DUPOVYWGMQDC4DAWLTF/
Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5RJGI6THWCDYTTEPXMWXU7CK66RQUTD4/
Code of Conduct: http://python.org/psf/codeofconduct/


--
--Guido van Rossum (python.org/~guido)
Pronouns: he/him (why is my pronoun here?)