[Python-Dev] Suggested memory API rules for 2.3
Tim Peters
tim.one@comcast.net
Wed, 03 Apr 2002 03:20:04 -0500
Guido and I sat down and hashed this out Tuesday afternoon. Here's what I
think Revealed Truth was. Indented text is intended to show up in the docs
(after Fred reworks for sanity). Unindented text (like this) is commentary.
Four technical words the docs really need are used freely:
shall - mandatory; all bets are off if you don't
shall not - forbidden; all bets are off if you do
should - recommended; may become mandatory later
should not - discouraged; may become forbidden later
Nothing is deprecated for 2.3. The layers of PyCore_XYZ macros Neil already
nuked are plain gone, and weren't documented anyway.
Nothing has been added to the public API (wrt 2.2 as a base).
These are the functions in the "raw memory" and "object memory"
APIs:
The raw memory API:
A. PyMem_{Malloc, Realloc, Free}
B. PyMem_{MALLOC, REALLOC, FREE}
C. PyMem_{New, Resize, Del}
D. PyMem_{NEW, RESIZE, DEL}
The object memory API:
E. PyObject_{Malloc, Realloc, Free}
F. PyObject_{MALLOC, REALLOC, FREE}
G. PyObject_{New, NewVar, Del}
H. PyObject_{NEW, NEWVAR, DEL}
Guido suggested leaving the PyObject_GC_xyz family out of this discussion,
as it's clean already, self-contained, can't mix with *any* of the guys
above, and has no compatibility nightmares to contend with (bless you,
Neil).
Function versus Macro Names
Names ending with a mixed-case suffix are called "function names"
(lines A, C, E, G). Names ending with an all-uppercase suffix are
called "macro names" (lines B, D, F, H). This is for convenience
in description, and does not necessarily reflect how a given name
is implemented in C.
Programs should use function names. Programs should not use
macro names. Uses of function names are guaranteed binary
compatible across releases; no such guarantee is given for uses
of macro names.
Yes, this means that almost all old code is now considered to be in bad
style, and you're on official notice that the macro names *may* be
deprecated down the road. I don't expect this to happen before 1.5.2 is no
longer of significant interest.
"Programs" does not include the Python core. We can do anything we need to
do in the core, based on our complete <heh> knowledge of each release's
internal implementation details.
Programs currently using a macro name should switch to the
corresponding function name (for example, replace PyMem_FREE with
PyMem_Free). In 2.3 and beyond, it's guaranteed that
corresponding function and macro names have the same input-output
behavior. Before 2.3, PyMem_MALLOC and PyMem_REALLOC did not
define what happened when passed a 0 argument.
Programs should not assume that macro name spellings "are faster"
than function name spellings. In most cases they aren't, and the
cases in which they are may vary across releases. Macro names are
retained only for backward compatibility.
We need to change PyMem_MALLOC and PyMem_REALLOC for 2.3. It's going to be
impossible for people to keep the rules straight if corresponding function
and macro versions ever have different basic semantics. Since the macro
names are not yet deprecated, they need to be fixed where they're currently
broken in this way.
Mixing and Matching
After memory has been obtained via one of these functions, it
should be resized and freed only by a function from the same line,
except that PyMem_Free may be used freely in place of PyMem_Del,
and PyObject_Free in place of PyObject_Del.
Maintaining the Free/Del pseudo-distinction is pointless.
For backward compatibility, memory obtained via the object memory
family can be freed by any of Py{Mem, Object}_{Free, FREE, Del, DEL}.
Mixing functions from the object family with the raw memory family
is likely to become deprecated,
Memory obtained by PyMem_{Malloc, MALLOC} shall be resized only
by PyMem_{Realloc, REALLOC}.
Memory obtained by PyMem_{New, NEW} shall be resized only
by PyMem_{Resize, RESIZE}.
Memory obtained by PyObject_{Malloc, MALLOC} shall be resized
only by PyObject_{Realloc, REALLOC}.
Note that the eight ways to spell "free" all have to map to the pymalloc
free when pymalloc is enabled in 2.3. There is no way to spell "give me the
raw platform free(), damn it", except for "free". If we think it's
important to have a such a way in the core, it should be added to the
private API.
PyObject_{Malloc, MALLOC, Realloc, REALLOC} should make the same guarantees
about 0 arguments as PyMem_xyz make. I'll have to dig into that, and, e.g.,
obmalloc.c's realloc currently goes out of its way *not* to treat 0 the way
the Python docs promise PyMem_Realloc works (I had already added an XXX
comment to the code there, but have not yet repaired it -- mostly because
"the rules" were still up in the air).
Relationship to Platform Allocator
All names in lines A, B, C and D ultimately invoke the platform
C malloc, realloc, or free. However, programs shall not mix any
of these names with direct calls to the platform malloc, calloc,
realloc, or free referencing the same base memory addresses, as
Python may need to perform bookkeeping of its own around its calls
to the platform allocator.
We're not promising that it's going to be easy to play with some other
implementation of malloc/realloc/free. We get one person per year who tries
that, and they're usually trying a package that (sensibly enough) supplies
its own macro replacements for the tokens "free", "malloc" and "realloc"
specifically. For that reason, having our own macros like
PyYAGNI_InRealLifeThisAlwaysExpandsToFree is pretty pointless; if you want
to use some other malloc/realloc/free package, recompile everything using
that package's header files.
Note that all names in E, F, G and H invoke the Python object
allocator. The Python object allocator may or may not invoke the
platform allocator, and the object allocator may vary depending on
configuration options.
We don't promise anything about the internals of the object allocator.
Threading Rules
Programs shall hold the global interpreter lock before invoking
a function in the object allocator (E, F, G, H). Programs
need not, but may, hold the global interpreter lock before
invoking a function in the raw memory allocator (A, B, C, D),
except that a program shall hold the global interpreter lock
before invoking PyMem_{Free, FREE, Del, DEL} on memory originally
obtained from the object allocator (but note that a program should
not mix calls this way).
The conditions under which the GIL need not be held need also to be
documented in the thread/GIL docs.
Use as Function Designators
Programs shall not use use a name from line B, C, D, F, G or H
in a C context requiring a function designator. Names from lines
A and E may be used in C contexts requiring a function designator.
A function designator is a C expression having function type;
C contexts requiring a function designator include as an actual
argument to a function taking a function argument.
This implies the A and E names have to expand to function names (not
function applications, unless we're crazy). It wouldn't make sense to
*require* this of "macro names". Many of the other "function names" *have*
to be implemented as macros expanding to non-function-designator
expressions, because they're specified to return a result of a type given by
one of their arguments, and you just can't spell that in C without a macro
to cast to the required type.
Recommended Practice
When backward compatibility is not a concern, using this subset of
the raw memory and object memory APIs is recommended:
PyMem_{Malloc, Realloc, Free}
PyObject_{Malloc, Realloc, Free}
PyObject_{New, NewVar}
Always use PyMem functions on memory obtained from a PyMem function,
and likewise for PyObject.
Uses of PyMem_{New, Resize} are probably clearer if you do the
multiplication and call the platform malloc/realloc yourself
(although be aware that platform malloc/realloc vary in how they
handle 0 arguments). Their existence in the Python C API dates
back to a time when object allocation was handled in a different
way.
There's no reason not to use PyObject_Del, but it's the same as
PyObject_Free, and using PyObject_Free instead should remind
you that it does nothing except release the memory.