[Python-Dev] Memory bitmaps for the Python cyclic garbage collector
INADA Naoki
songofacandy at gmail.com
Fri Sep 8 02:46:12 EDT 2017
Big +1. I love the idea.
str (especially, docstring), dict, and tuples are major memory eater in Python.
This may reduce tuple memory usage massively.
INADA Naoki <songofacandy at gmail.com>
On Fri, Sep 8, 2017 at 2:30 AM, Neil Schemenauer <neil at python.ca> wrote:
> Python objects that participate in cyclic GC (things like lists, dicts,
> sets but not strings, ints and floats) have extra memory overhead. I
> think it is possible to mostly eliminate this overhead. Also, while
> the GC is running, this GC state is mutated, which destroys
> copy-on-write optimizations. This change would mostly fix that
> issue.
>
> All objects that participate in cyclic GC have the Py_TPFLAGS_HAVE_GC
> bit set in their type. That causes an extra chunk of memory to be
> allocated *before* the ob_refcnt struct member. This is the PyGC_Head
> struct.
>
> The whole object looks like this in memory (PyObject pointer is at
> arrow):
>
> union __gc_head *gc_next;
> union __gc_head *gc_prev;
> Py_ssize_t gc_refs;
> -->
> Py_ssize_t ob_refcnt
> struct _typeobject *ob_type;
> [rest of PyObject members]
>
>
> So, 24 bytes of overhead on a 64-bit machine. The smallest Python
> object that can have a pointer to another object (e.g. a single PyObject
> * member) is 48 bytes. Removing PyGC_Head would cut the size of these
> objects in half.
>
> Carl Shaprio questioned me today on why we use a double linked-list and
> not the memory bitmap. I think the answer is that there is no good
> reason. We use a double linked list only due to historical constraints
> that are no longer present.
>
> Long ago, Python objects could be allocated using the system malloc or
> other memory allocators. Since we could not control the memory
> location, bitmaps would be inefficient. Today, we allocate all Python
> objects via our own function. Python objects under a certain size are
> allocated using our own malloc, obmalloc, and are stored in memory
> blocks known "arenas".
>
> The PyGC_Head struct performs three functions. First, it allows the GC
> to find all Python objects that will be checked for cycles (i.e. follow
> the linked list). Second, it stores a single bit of information to let
> the GC know if it is safe to traverse the object, set with
> PyObject_GC_Track(). Finally, it has a scratch area to compute the
> effective reference count while tracing refs (gc_refs).
>
> Here is a sketch of how we can remove the PyGC_Head struct for small
> objects (say less than 512 bytes). Large objects or objects created by
> a different memory allocator will still have the PyGC_Head overhead.
>
> * Have memory arenas that contain only objects with the
> Py_TPFLAGS_HAVE_GC flag. Objects like ints, strings, etc will be
> in different arenas, not have bitmaps, not be looked at by the
> cyclic GC.
>
> * For those arenas, add a memory bitmap. The bitmap is a bit array that
> has a bit for each fixed size object in the arena. The memory used by
> the bitmap is a fraction of what is needed by PyGC_Head. E.g. an
> arena that holds up to 1024 objects of 48 bytes in size would have a
> bitmap of 1024 bits.
>
> * The bits will be set and cleared by PyObject_GC_Track/Untrack()
>
> * We also need an array of Py_ssize_t to take over the job of gc_refs.
> That could be allocated only when GC is working and it only needs to
> be the size of the number of true bits in the bitmap. Or, it could be
> allocated when the arena is allocated and be sized for the full arena.
>
> * Objects that are too large would still get the PyGC_Head struct
> allocated "in front" of the PyObject. Because they are big, the
> overhead is not so bad.
>
> * The GC process would work nearly the same as it does now. Rather than
> only traversing the linked list, we would also have to crawl over the
> GC object arenas, check blocks of memory that have the tracked bit
> set.
>
> There are a lot of smaller details to work out but I see no reason
> why the idea should not work. It should significantly reduce memory
> usage. Also, because the bitmap and gc_refs are contiguous in
> memory, locality will be improved. Ćukasz Langa has mentioned that
> the current GC causes issues with copy-on-write memory in big
> applications. This change should solve that issue.
>
> To implement, I think the easiest path is to create new malloc to be
> used by small GC objects, e.g. gcmalloc.c. It would be similar to
> obmalloc but have the features needed to keep track of the bitmap.
> obmalloc has some quirks that makes it hard to use for this purpose.
> Once the idea is proven, gcmalloc could be merged or made to be a
> variation of obmalloc. Or, maybe just optimized and remain
> separate. obmalloc is complicated and highly optimized. So, adding
> additional functionality to it will be challenging.
>
> I believe this change would be ABI compatible.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com
More information about the Python-Dev
mailing list