On Apr 9, 2020, at 15:13, Wes Turner
- > And then take a look at how @ApacheArrow "supports zero-copy reads for lightning-fast data access without serialization overhead." - .@blazingsql … #cuDF … @ApacheArrow https://docs.blazingdb.com/docs/blazingsql
This isn’t relevant here at all. How objects get constructed and manage their internal storage is completely orthogonal to the how Python manages object lifetimes.
… New #DataFrame Interface and when that makes a copy for 2x+ memory use - "A dataframe protocol for the PyData ecosystem" https://discuss.ossdata.org/t/a-dataframe-protocol-for-the-pydata-ecosystem/...
Same here.
Presumably, nothing about magic del statements would affect C extensions, Cython, zero-copy reads, or data that's copied to the GPU for faster processing; but I don't understand this or how weakrefs and c-extensions share memory that could be unlinked by a del.
And same for some of this—but not all. C extensions can do the same kind of frame hacking, etc., as Python code, so they will have the same problems already raised in this thread. But I don’t think they they add anything new. (There are special rules allowing you to cheat with objects that haven’t been shared with Python code yet, which sounds like it would make things more complicated—until you realize that objects that haven’t been shared with Python code obviously can’t be affected by when Python code releases references.) But weakrefs would be affected, and that might be a problem with the proposal that I don’t think anyone else has noticed before you. Consider this toy example: spam = make_giant_spam() weakspam = weakref.ref(spam) with ThreadPoolExecutor() as e: for _ in range(1000): e.submit(dostuff, weakspam) Today, the spam variable lives until the end of the scope, which doesn’t happen until the with statement ends, which doesn’t happen until all 1000 tasks complete. So, the object in that variable is still alive for all of the tasks. With Guido’s proposed change, the spam variable is deleted after the last statement that uses it, which is before the with statement is even entered. Assuming it’s the only (non-weak) reference to the object, which is probably true, it will get destroyed, releasing all the memory (or other expensive resources) used by that giant spam object. That’s the whole point of the proposal, after all. But that means weakspam is now a dead weakref. So all those dostuff tasks are now doing stuff with a dead weakref. Presumably dostuff is designed to handle that safely, so you won’t crash or anything—but it can’t do the actual stuff you wanted it to do with that spam object. And, while this is obviously a toy example, perfectly reasonable real code will do similar things. It’s pretty common to use weakrefs for cases where 99% of the time the object is there but occasionally it’s dead (e.g., during graceful shutdown), and changing that 99% to 0% or 1% will make the entire process useless. It’s also common to use weakrefs for cases where 80% of the time the object is there but 20% of the time it’s been ejected from some cache and has to be regenerated; changing that 80% to 1% will mean the process still functions, but the cache is no longer doing anything, so it functions a lot slower. And so on. So, unless you could introduce some compiler magic to detect weakref.ref and weakref.weakdict.__setitem__ and so on (which might not be feasible, especially since it’s often buried inside some wrapper code), this proposal might well break many, maybe even most, good uses of weakrefs.
Would be interested to see the real performance impact of this potential optimization: - 10%: https://instagram-engineering.com/dismissing-python-garbage-collection-at-in...
Skimming this, it looks like this one is not just orthogonal to Guido’s proposal, it’s almost directly counter to it. Their goal is to have relatively short-lived killable children that defer refcount twiddling and destruction as much as possible so that fork-inherited objects don’t have to be copied and temporary objects don’t have to be cleaned up, they can just be abandoned. Guido’s goal is to get things decref’d and therefore hopefully destroyed as early as possible. Anyway, their optimization is definitely useful for a special class of programs that meet some requirements that sound unusual until you realize a lot of web servers/middlewares are designed around nearly the same requirements. People have done similar (in fact, even more radical, akin to building CPython and all of your extensions with refcounting completely disabled) in C and other languages, and there’s no reason (if you’re really careful) it couldn’t work in Python. But it’s certainly not the behavior you’d want from a general-purpose Python implementation.