Thanks for removing the mystery.
FWIW, here are some of the docs and resources for memory management in
Python;
I share these not to be obnoxious or to atoen, but to point to the docs
that would need updating to explain what is going on if this is not
explicit.
- https://docs.python.org/3/reference/datamodel.html#object.__del__
-
https://docs.python.org/3/extending/extending.html?highlight=__del__#thin-ic...
- https://docs.python.org/3/c-api/memory.html
- https://docs.python.org/3/library/gc.html
- https://docs.python.org/3/library/tracemalloc.html
- https://devguide.python.org/gdb/
- https://devguide.python.org/garbage_collector/
-
https://devguide.python.org/garbage_collector/#optimization-reusing-fields-t...
- https://doc.pypy.org/en/latest/gc_info.html
-
https://github.com/jythontools/jython/blob/master/src/org/python/modules/gc....
https://javadoc.io/doc/org.python/jython-standalone/2.7.2/
org/python/modules/gc.html
-
https://github.com/IronLanguages/ironpython2/blob/master/Src/IronPython.Modu...
https://github.com/IronLanguages/ironpython2/blob/master/Src/StdLib/Lib/test...
https://github.com/IronLanguages/ironpython2/blob/master/Src/StdLib/Lib/test...
https://github.com/IronLanguages/ironpython3/blob/master/Src/IronPython.Modu...
- "[Python-Dev] Re: Mixed Python/C debugging"
https://mail.python.org/archives/list/python-dev@python.org/message/Z3S2RAXR...
- @anthonypjshaw's CPython Internals book has a (will have a) memory
management chapter.
- > And then take a look at how @ApacheArrow
"supports zero-copy reads for lightning-fast data access without
serialization overhead."
- .@blazingsql … #cuDF … @ApacheArrow
https://docs.blazingdb.com/docs/blazingsql
… New #DataFrame Interface and when that makes a copy for 2x+ memory use
- "A dataframe protocol for the PyData ecosystem"
https://discuss.ossdata.org/t/a-dataframe-protocol-for-the-pydata-ecosystem/...
Presumably, nothing about magic del statements would affect C extensions,
Cython, zero-copy reads, or data that's copied to the GPU for faster
processing; but I don't understand this or how weakrefs and c-extensions
share memory that could be unlinked by a del.
Would be interested to see the real performance impact of this potential
optimization:
- 10%:
https://instagram-engineering.com/dismissing-python-garbage-collection-at-in...
On Thu, Apr 9, 2020 at 2:48 PM Andrew Barnert
On Apr 8, 2020, at 23:53, Wes Turner
wrote: Could something just heuristically add del statements with an AST
transformation that we could review with source control before committing?
When the gc pause occurs is something I don't fully understand. For
example:
Your examples don’t have anything to do with gc pause.
FWIW, this segfaults CPython in 2 lines:
import ctypes ctypes.cast(1, ctypes.py_object)
Yes, because this is ultimately trying to print the repr of (PyObject*)1, which means calling some function that tries to dereference some member of a struct at address 1, which means trying to access an int or pointer or whatever at address 1 or 9 or 17 or whatever. On most platforms, those addresses are going to be unmapped (and, on some, illegally aligned to boot), so you’ll get a segfault. This has nothing to do with the GC, or with Python objects at all.
Interestingly, this (tends to?) work; even when there are ah scope closures?:
import ctypes, gc x = 22 _id = id(x) del x gc.collect() y = ctypes.cast(_id, ctypes.py_object).value assert y == 22
The gc.collect isn’t doing anything here.
First, the 22 object, like other small integers and a few other special cases, is immortal. Even after you del x, the object is still alive, so of course everything works.
Even if you used a normal object that does get deleted, it would get deleted immediately when the last reference to the value goes away, in that del x statement. The collect isn’t needed and doesn’t do anything relevant here. (It’s there to detect reference cycles, like `a.b=b; b.a=a; del a; del b`. Assuming a and b were the only references to their objects at the start, a.b and b.a are the only references at the end. They won’t be deleted by refcounting because there’s still one reference to each, but they are garbage because they’re not accessible. The gc.collect is a cycle detector that handles exactly this case.)
But your code may well still often work on most platforms. Deleting an object rarely unmaps its memory; it just returns that memory to the object allocator’s store. Eventually that memory will be reused for another object, but until it is, it will often still look like a perfectly valid value if you cheat and look at it (as you’re doing). (And even after it’s reused, it will often end up getting reused by some object of the same shape, so you won’t crash, you’ll just get odd results.)
Anyway, getting off this side track and back to the main point: releasing the locals reference to an object that’s no longer being used locally isn’t guaranteed to destroy the object—but in CPython, if locals is the only reference, the object will be destroyed immediately. That’s why Guido’s optimization makes sense.
The only way gc pause is relevant is for other implementations. For example, if CPython stops guaranteeing that x is alive until the end of the scope under certain conditions, PyPy could decide to do the same thing, and in PyPy, there is no refcount; garbage is deleted when it’s detected by the GC. So it wouldn’t be deterministic when x goes away, and the question of how much earlier does it go away and how much benefit there is becomes more complicated than in CPython. But the PyPy guys seem to be really good at figuring out how to test such questions empirically.