[issue11849] glibc allocator doesn't release all free()ed memory

Charles-François Natali report at bugs.python.org
Mon May 2 18:57:56 CEST 2011


Charles-François Natali <neologix at free.fr> added the comment:

I've had some time to look at this, and I've written a quick demo
patch that should - hopefully - fix this, and reduce memory
fragmentation.
 A little bit of background first:
 - a couple years ago (probably true when pymalloc was designed and
merged), glibc's malloc used brk for small and medium allocations, and
mmap for large allocations, to reduce memory fragmentation (also,
because of the processes' VM layout in older Linux 32-bit kernels, you
couldn't have a heap bigger than 1GB). The threshold for routing
requests to mmap was fixed, and had a default of 256KB (exactly the
size of an pymalloc arena). Thus, all arenas were allocated with mmap
 - in 2006, a patch was merged to make this mmap threshold dynamic,
see http://sources.redhat.com/ml/libc-alpha/2006-03/msg00033.html for
more details
 - as a consequence, with modern glibc/elibc versions, the first
arenas will be allocated through mmap, but as soon as one of them is
freed, subsequent arenas allocation will be allocated from the heap
through brk, and not mmap
 - imagine the following happens :
   1) program creates many objects
   2) to store those objects, many arenas are allocated from the heap
through brk
   3) program destroys all the objects created, except 1 which is in
the last allocated arena
   4) since the arena has at least one object in it, it's not
deallocated, and thus the heap doesn't shrink, and the memory usage
remains high (with a huge hole between the base of the heap and its
top)
 Note that 3) can be a single leaked reference, or just a variable
that doesn't get deallocated immediately. As an example, here's a demo
program that should exhibit this behaviour:

 """
 import sys
 import gc

 # allocate/de-allocate/re-allocate the array to make sure that arenas are
 # allocated through brk
 tab = []
 for i in range(1000000):
    tab.append(i)
 tab = []
 for i in range(1000000):
    tab.append(i)

 print('after allocation')
 sys.stdin.read(1)

 # allocate a dict at the top of the heap (actually it works even without) this
 a = {}

 # deallocate the big array
 del tab
 print('after deallocation')
 sys.stdin.read(1)

 # collect
 gc.collect()
 print('after collection')
 sys.stdin.read(1)
 """

 You should see that even after the big array has been deallocated and
collected, the memory usage doesn't decrease.

 Also, there's another factor coming into play, the linked list of
arenas ("arenas" variable in Object/obmalloc.c), which is expanded
when there are not enough arenas allocated: if this variable is
realloc()ed while the heap is really large and whithout hole in it, it
will be allocated from the top of the heap, and since it's not resized
when the number of used arenas goes down, it will remain at the top of
the heap and will also prevent the heap from shrinking.

 My demo patch (pymem.diff) thus does two things:
 1) use mallopt to fix the mmap threshold so that arenas are allocated
through mmap
 2) increase the maximum size of requests handled by pymalloc from
256B to 512B (as discussed above with Antoine). The reason is that if
a PyObject_Malloc request is not handled by pymalloc from an arena
(i.e. greater than 256B) and is less than the mmap threshold, then we
can't do anything if it's not freed and remains in the middle of the
heap. That's exactly what's happening in the OP case, some
dictionnaries aren't deallocated even after the collection (I couldn't
quite identify them, but there seems to be some UTF-8 codecs and other
stuff)

 To sum up, this patch increases greatly the likelihood of Python's
objects being allocated from arenas which should reduce fragmentation
(and seems to speed up certain operations quite a bit), and ensures
that arenas are allocated from mmap so that a single dangling object
doesn't prevent the heap from being trimmed.

 I've tested it on RHEL6 64-bit and Debian 32-bit, but it'd be great
if someone else could try it - and of course comment on the above
explanation/proposed solution.
Here's the result on Debian 32-bit:

Without patch:

*** Python 3.3.0 alpha
---   PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
  0  1843 pts/1    S+     0:00      1  1795  9892  7528  0.5 ./python
/home/cf/issue11849_test.py
  1  1843 pts/1    S+     0:16      1  1795 63584 60928  4.7 ./python
/home/cf/issue11849_test.py
  2  1843 pts/1    S+     0:33      1  1795 112772 109064  8.4
./python /home/cf/issue11849_test.py
  3  1843 pts/1    S+     0:50      1  1795 162140 159424 12.3
./python /home/cf/issue11849_test.py
  4  1843 pts/1    S+     1:06      1  1795 211376 207608 16.0
./python /home/cf/issue11849_test.py
END  1843 pts/1    S+     1:25      1  1795 260560 256888 19.8
./python /home/cf/issue11849_test.py
 GC  1843 pts/1    S+     1:26      1  1795 207276 204932 15.8
./python /home/cf/issue11849_test.py

With patch:

*** Python 3.3.0 alpha
---   PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
  0  1996 pts/1    S+     0:00      1  1795 10160  7616  0.5 ./python
/home/cf/issue11849_test.py
  1  1996 pts/1    S+     0:16      1  1795 64168 59836  4.6 ./python
/home/cf/issue11849_test.py
  2  1996 pts/1    S+     0:33      1  1795 114160 108908  8.4
./python /home/cf/issue11849_test.py
  3  1996 pts/1    S+     0:50      1  1795 163864 157944 12.2
./python /home/cf/issue11849_test.py
  4  1996 pts/1    S+     1:07      1  1795 213848 207008 15.9
./python /home/cf/issue11849_test.py
END  1996 pts/1    S+     1:26      1  1795 68280 63776  4.9 ./python
/home/cf/issue11849_test.py
 GC  1996 pts/1    S+     1:26      1  1795 12112  9708  0.7 ./python
/home/cf/issue11849_test.py

Antoine: since the increasing of the pymalloc threshold is part of the
solution to this problem, I'm attaching a standalone patch here
(pymalloc_threshold.diff). It's included in pymem.diff.
I'll try post some pybench results tomorrow.

----------
Added file: http://bugs.python.org/file21858/pymem.diff
Added file: http://bugs.python.org/file21859/pymalloc_threshold.diff

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue11849>
_______________________________________
-------------- next part --------------
diff -r fd45c2452be3 Objects/obmalloc.c
--- a/Objects/obmalloc.c	Sat Apr 30 15:30:03 2011 +0200
+++ b/Objects/obmalloc.c	Mon May 02 18:41:41 2011 +0200
@@ -1,4 +1,5 @@
 #include "Python.h"
+#include <malloc.h>
 
 #ifdef WITH_PYMALLOC
 
@@ -75,7 +76,8 @@
  * Allocation strategy abstract:
  *
  * For small requests, the allocator sub-allocates <Big> blocks of memory.
- * Requests greater than 256 bytes are routed to the system's allocator.
+ * Requests greater than SMALL_REQUEST_THRESHOLD bytes are routed to the
+ * system's allocator.
  *
  * Small requests are grouped in size classes spaced 8 bytes apart, due
  * to the required valid alignment of the returned address. Requests of
@@ -107,10 +109,11 @@
  *       57-64                   64                       7
  *       65-72                   72                       8
  *        ...                   ...                     ...
- *      241-248                 248                      30
- *      249-256                 256                      31
+ *      497-504                 504                      62
+ *      505-512                 512                      63
  *
- *      0, 257 and up: routed to the underlying allocator.
+ *      0, SMALL_REQUEST_THRESHOLD + 1 and up: routed to the underlying
+ *      allocator.
  */
 
 /*==========================================================================*/
@@ -139,14 +142,17 @@
  * small enough in order to use preallocated memory pools. You can tune
  * this value according to your application behaviour and memory needs.
  *
+ * Note: a size threshold of 512 guarantees that newly created dictionaries
+ * will be allocated from preallocated memory pools on 64-bit.
+ *
  * The following invariants must hold:
- *      1) ALIGNMENT <= SMALL_REQUEST_THRESHOLD <= 256
+ *      1) ALIGNMENT <= SMALL_REQUEST_THRESHOLD <= 512
  *      2) SMALL_REQUEST_THRESHOLD is evenly divisible by ALIGNMENT
  *
  * Although not required, for better performance and space efficiency,
  * it is recommended that SMALL_REQUEST_THRESHOLD is set to a power of 2.
  */
-#define SMALL_REQUEST_THRESHOLD 256
+#define SMALL_REQUEST_THRESHOLD 512
 #define NB_SMALL_SIZE_CLASSES   (SMALL_REQUEST_THRESHOLD / ALIGNMENT)
 
 /*
@@ -440,6 +446,9 @@
     , PT(48), PT(49), PT(50), PT(51), PT(52), PT(53), PT(54), PT(55)
 #if NB_SMALL_SIZE_CLASSES > 56
     , PT(56), PT(57), PT(58), PT(59), PT(60), PT(61), PT(62), PT(63)
+#if NB_SMALL_SIZE_CLASSES > 64
+#error "NB_SMALL_SIZE_CLASSES should be less than 64"
+#endif /* NB_SMALL_SIZE_CLASSES > 64 */
 #endif /* NB_SMALL_SIZE_CLASSES > 56 */
 #endif /* NB_SMALL_SIZE_CLASSES > 48 */
 #endif /* NB_SMALL_SIZE_CLASSES > 40 */
@@ -545,6 +554,10 @@
         if (numarenas > PY_SIZE_MAX / sizeof(*arenas))
             return NULL;                /* overflow */
 #endif
+        /* Ensure arenas are allocated by mmap to avoid memory
+         * fragmentation. */
+        if (numarenas == INITIAL_ARENA_OBJECTS)
+            mallopt(M_MMAP_THRESHOLD, ARENA_SIZE);
         nbytes = numarenas * sizeof(*arenas);
         arenaobj = (struct arena_object *)realloc(arenas, nbytes);
         if (arenaobj == NULL)
-------------- next part --------------
diff -r bbfc65d05588 Objects/obmalloc.c
--- a/Objects/obmalloc.c	Thu Apr 07 10:48:29 2011 -0400
+++ b/Objects/obmalloc.c	Mon Apr 25 15:10:43 2011 +0200
@@ -75,7 +75,8 @@
  * Allocation strategy abstract:
  *
  * For small requests, the allocator sub-allocates <Big> blocks of memory.
- * Requests greater than 256 bytes are routed to the system's allocator.
+ * Requests greater than SMALL_REQUEST_THRESHOLD bytes are routed to the
+ * system's allocator.
  *
  * Small requests are grouped in size classes spaced 8 bytes apart, due
  * to the required valid alignment of the returned address. Requests of
@@ -107,10 +108,11 @@
  *       57-64                   64                       7
  *       65-72                   72                       8
  *        ...                   ...                     ...
- *      241-248                 248                      30
- *      249-256                 256                      31
+ *      497-504                 504                      62
+ *      505-512                 512                      63
  *
- *      0, 257 and up: routed to the underlying allocator.
+ *      0, SMALL_REQUEST_THRESHOLD + 1 and up: routed to the underlying
+ *      allocator.
  */
 
 /*==========================================================================*/
@@ -139,14 +141,17 @@
  * small enough in order to use preallocated memory pools. You can tune
  * this value according to your application behaviour and memory needs.
  *
+ * Note: a size threshold of 512 guarantees that newly created dictionaries
+ * will be allocated from preallocated memory pools on 64-bit.
+ *
  * The following invariants must hold:
- *      1) ALIGNMENT <= SMALL_REQUEST_THRESHOLD <= 256
+ *      1) ALIGNMENT <= SMALL_REQUEST_THRESHOLD <= 512
  *      2) SMALL_REQUEST_THRESHOLD is evenly divisible by ALIGNMENT
  *
  * Although not required, for better performance and space efficiency,
  * it is recommended that SMALL_REQUEST_THRESHOLD is set to a power of 2.
  */
-#define SMALL_REQUEST_THRESHOLD 256
+#define SMALL_REQUEST_THRESHOLD 512
 #define NB_SMALL_SIZE_CLASSES   (SMALL_REQUEST_THRESHOLD / ALIGNMENT)
 
 /*
@@ -440,6 +445,9 @@
     , PT(48), PT(49), PT(50), PT(51), PT(52), PT(53), PT(54), PT(55)
 #if NB_SMALL_SIZE_CLASSES > 56
     , PT(56), PT(57), PT(58), PT(59), PT(60), PT(61), PT(62), PT(63)
+#if NB_SMALL_SIZE_CLASSES > 64
+#error "NB_SMALL_SIZE_CLASSES should be less than 64"
+#endif /* NB_SMALL_SIZE_CLASSES > 64 */
 #endif /* NB_SMALL_SIZE_CLASSES > 56 */
 #endif /* NB_SMALL_SIZE_CLASSES > 48 */
 #endif /* NB_SMALL_SIZE_CLASSES > 40 */


More information about the Python-bugs-list mailing list