responses inline -eric On Wed, Mar 9, 2022 at 8:23 AM Petr Viktorin <encukou@gmail.com> wrote:
"periodically reset the refcount for immortal objects (only enable this if a stable ABI extension is imported?)" -- that sounds quite expensive, both at runtime and maintenance-wise.
Are you talking just about "(only enable this if a stable ABI extension is imported?)"? Such a check could certainly be expensive but it doesn't have to be. However, I'm guessing that you are actually talking about the mechanism to periodically reset the refcount. The actual periodic reset doesn't seem like it needs to be all that expensive overall. It would just need to be in a place that gets triggered often enough, but not too often such that the extra cost of resetting the refcount would be a problem. One important factor is whether we need to worry about potential de-immortalization for all immortal objects or only for a specific subset, like the most commonly used objects (at least most commonly used by the problematic older stable ABI extensions), Mostly, we only need to be concerned with the objects that are likely to trigger de-immortalization in those extensions. Realistically, there aren't many potential immortal objects that would be exposed to the de-immortalization problem (e.g. None, True, False), so we could limit this workaround to them. A variety of options come to mind. In each case we would reset the refcount of a given object if it is immortal. (We would also only do so if the refcount actually changed--to avoid cache invalidation and copy-on-write.) If we need to worry about *all* immortal objects then I see several options: 1. in a single place where stable ABI extensions are likely to pass all objects often enough 2. in a single place where all objects pass through often enough On the other hand, if we only need to worry about a fixed set of objects, the following options come to mind: 1. in a single place that is likely to be called by older stable ABI extensions 2. in a place that runs often enough, targeting a hard-coded group of immortal objects (common static globals like None) * perhaps in the eval breaker code, in exception handling, etc. 3. like (2) but rotate through subsets of the hard-coded group (to reduce the overall cost) 4. like (2), but in spread out in type-specific code (e.g. static types could be reset in type_dealloc()) Again, none of those should be in code that runs often enough that the overhead would add up.
"provide a runtime flag for disabling immortality" also doesn't sound workable to me. We'd essentially need to run all tests twice every time to make sure it stays working.
Yeah, that makes it not worth it.
"Special-casing immortal objects in tp_dealloc() for the relevant types (but not int, due to frequency?)" sounds promising.
The "relevant types" are those for which we skip calling incref/decref entirely, like in Py_RETURN_NONE. This skipping is one of the optional optimizations, so we're entirely in control of if/when to apply it.
We would definitely do it for those types. NoneType and bool already have a tp_dealloc that calls Py_FatalError() if triggered. The tp_dealloc for str & tuple have special casing for some singletons that do likewise. In PyType_Type.tp_dealloc we have a similar assert for static types. In each case we would instead reset the refcount to the initial immortal value. Regardless, in practice we may only need to worry (as noted above) about the problem for the most commonly used global objects, so perhaps we could stop there. However, it depends on what the level of risk is, such that it would warrant incurring additional potential performance/maintenance costs. What is the likelihood of actual crashes due to pathological de-immortalization in older stable ABI extensions? I don't have a clear answer to offer on that but I'd only expect it to be a problem if such extensions are used heavily in (very) long-running processes.
How much would it slow things back down if it wasn't done for ints at all?
I'll look into that. We're talking about the ~260 small ints, so it depends on how much they are used relative to all the other int objects that are used in a program.
Some more reasoning for not worrying about de-immortalizing in types without this optimization: These objects will be de-immortalized with refcount around 2^29, and then incref/decref go back to being paired properly. If 2^29 is much higher than the true reference count at de-immortalization, this'll just cause a memory leak at shutdown. And it's probably OK to assume that the true reference count of an object can't be anywhere near 2^29: most of the time, to hold a reference you also need to have a pointer to the referenced object, and there ain't enough memory for that many pointers. This isn't a formally sound assumption, of course -- you can incref a million times with a single pointer if you pair the decrefs correctly. But it might be why we had no issues with "int won't overflow", an assumption which would fail with just 4× higher numbers.
Yeah, if we're dealing with properly paired incref/decref then the worry about crashing after de-immortalization is mostly gone. The problem is where in the runtime we would simply not call Py_INCREF() on certain objects because we know they are immortal. For instance, Py_RETURN_NONE (outside the older stable ABI) would no longer incref, while the problematic stable ABI extension would keep actually decref'ing until we crash. Again, I'm not sure what the likelihood of this case is. It seems very unlikely to me.
Of course, the this argument would apply to immortalization and 64-bit builds as well. I wonder if there are holes in it :)
With the numbers involved on 64-bit the problem is super unlikely due to the massive numbers we're talking about, so we don't need to worry. Or perhaps I misunderstood your point?
Oh, and if the "Special-casing immortal objects in tp_dealloc()" way is valid, refcount values 1 and 0 can no longer be treated specially. That's probably not a practical issue for the relevant types, but it's one more thing to think about when applying the optimization.
Given the low chance of the pathological case, the nature of the conditions where it might happen, and the specificity of 0 and 1 amongst all the possible values, I wouldn't consider this a problem.
There's also the other direction to consider: if an old stable-ABI extension does unpaired *increfs* on an immortal object, it'll eventually overflow the refcount. When the refcount is negative, decref will currently crash if built with Py_DEBUG, and I think we want to keep that check/crash. (Note that either be Python itself or any extension could be built with Py_DEBUG.) Hopefully we can live with that, and hope anyone running with Py_DEBUG will send a useful use case report.
Yeah, the overflow case is not new. The only difference here (with the specialized tp_dealloc) is that we'd reset the refcount for immortal objects. Of course, we could adjust the check in Py_DECREF() to accommodate both situations and keep crashing like it does not.
Or is there another bit before the sign this'll mess up?
Nope, the sign bit is still highest-order.