=> The performance of the benchmark depends on the usage of low-level memory caches (L1, L2, L3).
I understand that in some cases, more memory fits into the fatest caches, and so the benchmark is faster. But sometimes, all memory doesn't fit, and so the benchmark is slower.
Maybe the problem is that memory is close to memory pages boundaries, or doesn't fit into L1 cache lines, or something like that.
I think you misunderstand how caches work. The way caches work depends on the addresses of memory (their value) which even with ASLR disabled can differ between runs. Then you either do or don't have cache collisions. How about you just accept the fact that there is a statistical distribution of the results on not the concrete "right" result? I tried to explain to you before that even if you get the "right" result, it'll still be at best just one sample of the statistics.