Note that ARM is merely an architecture with very diverse
implementations having quite differing performance characteristics. [...]
Understood. I'd be happy to see timings on a Raspberry Pi 3, say. I'm not too worried about things like the RPi Pico - that seems like it would be more of a target for MicroPython than CPython.
Wikipedia thinks, and the ARM architecture manuals seem to confirm, that most 32-bit ARM instruction sets _do_ support the UMULL 32-bit-by-32-bit-to-64-bit multiply instruction. (From
https://en.wikipedia.org/wiki/ARM_architecture#Arithmetic_instructions: "ARM supports 32-bit × 32-bit multiplies with either a 32-bit result or 64-bit result, though Cortex-M0 / M0+ / M1 cores don't support 64-bit results.") Division may still be problematic.
--
Mark