gc ideas -- dynamic profiling
Python organizes objects into 3 generations, ephemeral, short- and long-lived. When object is created it is place in ephemeral, if it lives long enough, it is move to short-lived and so on. q1 are generations placed in separate memory regions, or are all generations in one memory regions and there is a pointer that signifies the boundary between generations? I propose to track hot spots in python, that is contexts where most of allocations occur, and instrument these with counters that essentially tell how often an object generated here ends up killed in ephemeral, short-, long-lived garbage collector run or is in fac tstill alive. If a particular allocation context creates objects that are likely to be long-lived, allocator could skip frst 2 generations altogether (generations are separate regions) or preload the object with high survival count (if q1 is single region). On the other hand, if we know where most allocations occur, we can presume that most of these allocations are ephemeral, otherwise we run out of memory anyway, if this is indeed so, it makes my point moot. Implications are extra code to define context, extra pointer back to context from every allocation (or alterntively a weakref from allocation point to every object it generated) and real-time accounting as to what happens to these objects. It should be possible to approach this problem statistically, that is instrument only every 100s object or so. Context could be simple, e.g. bytecode operation 3 on line 45 in module "junk", or more complex, e.g. call stack str<-generator@238382<-function "foo"<-function "boo"<-...; or even patterns like ''' str<-any depth<-function "big" '''; clearly figuring out what hotspots are is already non-trivial, the more coplex the definition of context the more it borders downright impossible. p.s. can anyone share modern cpython profiling results to shed some light on how important gc optimization really is?
q1 are generations placed in separate memory regions, or are all generations in one memory regions and there is a pointer that signifies the boundary between generations?
You should really start reading the source code. See Modules/gcmodule.c. To answer your question: neither, nor. All objects are in one region, and there is no pointer separating the generations. Instead, all objects belonging to one generation form a double-linked list.
I propose to track hot spots in python, that is contexts where most of allocations occur, and instrument these with counters that essentially tell how often an object generated here ends up killed in ephemeral, short-, long-lived garbage collector run or is in fac tstill alive. If a particular allocation context creates objects that are likely to be long-lived, allocator could skip frst 2 generations altogether (generations are separate regions) or preload the object with high survival count (if q1 is single region).
We would consider such a proposal only if you had *actual evidence* that it improves things, rather than just having a reasoning that it might. Regards, Martin
participants (2)
-
"Martin v. Löwis"
-
Dima Tisnek