![](https://secure.gravatar.com/avatar/db5f70d2f2520ef725839f046bdc32fb.jpg?s=120&d=mm&r=g)
On Fri, 6 Dec 2019 13:54:13 +0000 Rhodri James <rhodri@kynesim.co.uk> wrote:
Apologies again for commenting in the wrong place.
On 05/12/2019 16:38, Mark Shannon wrote:
Memory access is usually a limiting factor in the performance of modern CPUs. Better packing of data structures enhances locality and> reduces memory bandwith, at a modest increase in ALU usage (for shifting and masking).
I don't think this assertion holds much water:
1. Caching make memory access much less of a limit than you would expect. 2. Non-aligned memory access vary from inefficient to impossible depending on the processor. 3. Shifting and masking isn't free, and again on some processors can be very expensive.
I think your knowledge is outdated. Shifts and masks are extremely fast on modern CPUs, and unaligned loads are fast as well (when served from the CPU cache). Moreover, modern CPUs are superscalar with many different execution units, so those instructions can be executed in parallel with other independent instructions. However, as soon as you load from main memory because of a cache miss, you take a hit of several hundreds cycles. Basically, computations are almost free compared to the cost of memory accesses. In any case, this will have to be judged on benchmark numbers, once Mark (or someone else) massages the interpreter to experiment with those runtime memory footprint reductions. Regards Antoine.