It's relatively easy to test replacing our custom allocators with the system ones, yes? Can we try those to see whether they have the same characteristic?
Yes.
PYTHONMALLOC=malloc LD_PRELOAD=/path/to/jemalloc.so python script.py
I will try it tomorrow.
Can you simply test with the system allocator rather than jemalloc?
For the record, result for 10M nodes, Ubuntu 18.04 on AWS r5a.4xlarge:
$ local/bin/python3 t1.py # default
1138.1123778309993 -- end train, start del
688.7927911250008 -- end
$ arena-1m/bin/python3 t1.py # Changed ARENA_SIZE to 1MiB
1085.3363994170013 -- end train, start del
84.57135540099989 -- end
$ PYTHONMALLOC=malloc local/bin/python3 t1.py
1157.4882792789995 -- end train, start del
27.919834706000074 -- end
$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1
PYTHONMALLOC=malloc local/bin/python3 t1.py
1098.4383037820007 -- end train, start del
117.93938426599925 -- end
In this case, glibc malloc is the fastest.
glibc is know to weak about fragmentation.
But algorithm to avoid fragmentation is just an overhead in this script.
Regards,
--
Inada Naoki