Le 18/09/2017 à 19:53, Nathaniel Smith a écrit :
Why are reference cycles a problem that needs solving?
Because sometimes they are holding up costly resources in memory when people don't expect them to. Such as large Numpy arrays :-)
Do we have any reason to believe that this is actually happening on a regular basis though?
Define "regular" :-) We did get some reports on dask/distributed about it.
If it is then it might make sense to look at the cycle collection heuristics; IIRC they're based on a fairly naive count of how many allocations have been made, without regard to their size.
Yes... But just because a lot of memory has been allocated isn't a good enough heuristic to launch a GC collection. What if that memory is gonna stay allocated for a long time? Then you're frequently launching GC runs for no tangible result except more CPU consumption and frequent pauses. Perhaps we could special-case tracebacks somehow, flag when a traceback remains alive after the implicit "del" clause at the end of an "except" block, then maintain some kind of linked list of the flagged tracebacks and launch specialized GC runs to find cycles accross that collection. That sounds quite involved, though.
The issue that Victor ran into with socket.create_connection is a special case where that function saves off the caught exception to use later.
That's true... but you can find such special cases in an assorted bunch of library functions (stdlib or third-party), not just socket.create_connection(). Fixing them one by one is always possible, but it's a never-ending battle. Regards Antoine.