New issue 2974: GC + an api to Malloc's call to ReleaseFreeMemory for the “unreturned memory” case

Alex Kashirin:

In this case, the none GLIBC_MALLOC is TCMALLOC 

A successful build with TCMALLOC goes well with this flag preset to the translation and build,


LDFLAGS="-DTCMALLOC -ltcmalloc -lunwind \
-fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free" \
python ../../rpython/bin/rpython \
--no-shared --thread --make-jobs=8 \
--opt=jit targetpypystandalone.py --allworkingmodules 

for ref to full build, https://github.com/kashirin-alex/environments-builder/blob/master/scripts/pypy3.sh#L11

By the attached test,
The result are great, with at peak of 24GB~, the final RSS is 128MB for a steady consumption of 64MB while initial RSS was 56 MB

For the test purpose, the test command was :



TCMALLOC_RELEASE_RATE, 1000 is a quite high, evaluated to release 100% each time possible. while 100 is around 10% of possible.

What could be a real improvement is an API in the PYPY-GC to Malloc's release mem calls,
In case of TCMALLOC , it is the 

for a compilation with the compiler definers, -DTCMALLOC or -DTCMALLOC_MINIMAL
required to add to the PYPY-GC with a corresponding py-api


#if defined(TCMALLOC) || defined(TCMALLOC_MINIMAL)
#include <gperftools/malloc_extension.h>
void ...pypy..GC.. malloc_release(){

This way the memory can be release as needed and not only at RELEASE_RATE

the memtest.py results look as follow:

Total memory consumed:
    GC used:            15.1MB (peak: 24473.6MB)
       in arenas:            4.7MB
       rawmalloced:          2.4MB
       nursery:              8.0MB
    raw assembler used: 57.1kB
    Total:              15.2MB

    Total memory allocated:
    GC allocated:            16.4MB (peak: 24474.1MB)
       in arenas:            5.6MB
       rawmalloced:          24460.1MB
       nursery:              8.0MB
    raw assembler allocated: 1.0MB
    Total:                   17.4MB

    Total time spent in GC:  11.225

steady consumed duration: 22.838454008102417
steady num tests: 207
test load duration: 54.853511571884155
initial_rss: 0.056 GB
actual consumed: 0.119 GB
MALLOC:       19724440 (   18.8 MiB) Bytes in use by application
MALLOC: +     88956928 (   84.8 MiB) Bytes in page heap freelist
MALLOC: +       479624 (    0.5 MiB) Bytes in central cache freelist
MALLOC: +            0 (    0.0 MiB) Bytes in transfer cache freelist
MALLOC: +       849376 (    0.8 MiB) Bytes in thread cache freelists
MALLOC: +     39059456 (   37.2 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =    149069824 (  142.2 MiB) Actual memory used (physical + swap)
MALLOC: +  28540633088 (27218.5 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =  28689702912 (27360.6 MiB) Virtual address space used
MALLOC:            347              Spans in use
MALLOC:              2              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

If you would like to have the pypy3 compilation in a package , let me know, it require some  .

