[Python-Dev] Memory bitmaps for the Python cyclic garbage collector

INADA Naoki songofacandy at gmail.com
Fri Sep 8 02:46:12 EDT 2017


Big +1.  I love the idea.

str (especially, docstring), dict, and tuples are major memory eater in Python.
This may reduce tuple memory usage massively.

INADA Naoki  <songofacandy at gmail.com>


On Fri, Sep 8, 2017 at 2:30 AM, Neil Schemenauer <neil at python.ca> wrote:
> Python objects that participate in cyclic GC (things like lists, dicts,
> sets but not strings, ints and floats) have extra memory overhead.  I
> think it is possible to mostly eliminate this overhead.  Also, while
> the GC is running, this GC state is mutated, which destroys
> copy-on-write optimizations.  This change would mostly fix that
> issue.
>
> All objects that participate in cyclic GC have the Py_TPFLAGS_HAVE_GC
> bit set in their type.  That causes an extra chunk of memory to be
> allocated *before* the ob_refcnt struct member.  This is the PyGC_Head
> struct.
>
> The whole object looks like this in memory (PyObject pointer is at
> arrow):
>
>     union __gc_head *gc_next;
>     union __gc_head *gc_prev;
>     Py_ssize_t gc_refs;
> -->
>     Py_ssize_t ob_refcnt
>     struct _typeobject *ob_type;
>     [rest of PyObject members]
>
>
> So, 24 bytes of overhead on a 64-bit machine.  The smallest Python
> object that can have a pointer to another object (e.g. a single PyObject
> * member) is 48 bytes.  Removing PyGC_Head would cut the size of these
> objects in half.
>
> Carl Shaprio questioned me today on why we use a double linked-list and
> not the memory bitmap.  I think the answer is that there is no good
> reason. We use a double linked list only due to historical constraints
> that are no longer present.
>
> Long ago, Python objects could be allocated using the system malloc or
> other memory allocators.  Since we could not control the memory
> location, bitmaps would be inefficient.  Today, we allocate all Python
> objects via our own function.  Python objects under a certain size are
> allocated using our own malloc, obmalloc, and are stored in memory
> blocks known "arenas".
>
> The PyGC_Head struct performs three functions.  First, it allows the GC
> to find all Python objects that will be checked for cycles (i.e. follow
> the linked list).  Second, it stores a single bit of information to let
> the GC know if it is safe to traverse the object, set with
> PyObject_GC_Track().  Finally, it has a scratch area to compute the
> effective reference count while tracing refs (gc_refs).
>
> Here is a sketch of how we can remove the PyGC_Head struct for small
> objects (say less than 512 bytes).  Large objects or objects created by
> a different memory allocator will still have the PyGC_Head overhead.
>
> * Have memory arenas that contain only objects with the
>   Py_TPFLAGS_HAVE_GC flag.  Objects like ints, strings, etc will be
>   in different arenas, not have bitmaps, not be looked at by the
>   cyclic GC.
>
> * For those arenas, add a memory bitmap.  The bitmap is a bit array that
>   has a bit for each fixed size object in the arena.  The memory used by
>   the bitmap is a fraction of what is needed by PyGC_Head.  E.g. an
>   arena that holds up to 1024 objects of 48 bytes in size would have a
>   bitmap of 1024 bits.
>
> * The bits will be set and cleared by PyObject_GC_Track/Untrack()
>
> * We also need an array of Py_ssize_t to take over the job of gc_refs.
>   That could be allocated only when GC is working and it only needs to
>   be the size of the number of true bits in the bitmap.  Or, it could be
>   allocated when the arena is allocated and be sized for the full arena.
>
> * Objects that are too large would still get the PyGC_Head struct
>   allocated "in front" of the PyObject.  Because they are big, the
>   overhead is not so bad.
>
> * The GC process would work nearly the same as it does now.  Rather than
>   only traversing the linked list, we would also have to crawl over the
>   GC object arenas, check blocks of memory that have the tracked bit
>   set.
>
> There are a lot of smaller details to work out but I see no reason
> why the idea should not work.  It should significantly reduce memory
> usage.  Also, because the bitmap and gc_refs are contiguous in
> memory, locality will be improved.  Łukasz Langa has mentioned that
> the current GC causes issues with copy-on-write memory in big
> applications.  This change should solve that issue.
>
> To implement, I think the easiest path is to create new malloc to be
> used by small GC objects, e.g. gcmalloc.c.  It would be similar to
> obmalloc but have the features needed to keep track of the bitmap.
> obmalloc has some quirks that makes it hard to use for this purpose.
> Once the idea is proven, gcmalloc could be merged or made to be a
> variation of obmalloc.  Or, maybe just optimized and remain
> separate.  obmalloc is complicated and highly optimized.  So, adding
> additional functionality to it will be challenging.
>
> I believe this change would be ABI compatible.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com


More information about the Python-Dev mailing list