[Python-Dev] Suggested memory API rules for 2.3

Tim Peters tim.one@comcast.net
Wed, 03 Apr 2002 03:20:04 -0500


Guido and I sat down and hashed this out Tuesday afternoon.  Here's what I
think Revealed Truth was.  Indented text is intended to show up in the docs
(after Fred reworks for sanity).  Unindented text (like this) is commentary.
Four technical words the docs really need are used freely:

    shall      - mandatory; all bets are off if you don't
    shall not  - forbidden; all bets are off if you do
    should     - recommended; may become mandatory later
    should not - discouraged; may become forbidden later

Nothing is deprecated for 2.3.  The layers of PyCore_XYZ macros Neil already
nuked are plain gone, and weren't documented anyway.

Nothing has been added to the public API (wrt 2.2 as a base).

    These are the functions in the "raw memory" and "object memory"
    APIs:

    The raw memory API:

    A. PyMem_{Malloc, Realloc, Free}
    B. PyMem_{MALLOC, REALLOC, FREE}

    C. PyMem_{New, Resize, Del}
    D. PyMem_{NEW, RESIZE, DEL}

    The object memory API:

    E. PyObject_{Malloc, Realloc, Free}
    F. PyObject_{MALLOC, REALLOC, FREE}

    G. PyObject_{New, NewVar, Del}
    H. PyObject_{NEW, NEWVAR, DEL}

Guido suggested leaving the PyObject_GC_xyz family out of this discussion,
as it's clean already, self-contained, can't mix with *any* of the guys
above, and has no compatibility nightmares to contend with (bless you,
Neil).

    Function versus Macro Names

    Names ending with a mixed-case suffix are called "function names"
    (lines A, C, E, G).  Names ending with an all-uppercase suffix are
    called "macro names" (lines B, D, F, H).  This is for convenience
    in description, and does not necessarily reflect how a given name
    is implemented in C.

    Programs should use function names.  Programs should not use
    macro names.  Uses of function names are guaranteed binary
    compatible across releases; no such guarantee is given for uses
    of macro names.

Yes, this means that almost all old code is now considered to be in bad
style, and you're on official notice that the macro names *may* be
deprecated down the road.  I don't expect this to happen before 1.5.2 is no
longer of significant interest.

"Programs" does not include the Python core.  We can do anything we need to
do in the core, based on our complete <heh> knowledge of each release's
internal implementation details.

    Programs currently using a macro name should switch to the
    corresponding function name (for example, replace PyMem_FREE with
    PyMem_Free).  In 2.3 and beyond, it's guaranteed that
    corresponding function and macro names have the same input-output
    behavior.  Before 2.3, PyMem_MALLOC and PyMem_REALLOC did not
    define what happened when passed a 0 argument.

    Programs should not assume that macro name spellings "are faster"
    than function name spellings.  In most cases they aren't, and the
    cases in which they are may vary across releases.  Macro names are
    retained only for backward compatibility.

We need to change PyMem_MALLOC and PyMem_REALLOC for 2.3.  It's going to be
impossible for people to keep the rules straight if corresponding function
and macro versions ever have different basic semantics.  Since the macro
names are not yet deprecated, they need to be fixed where they're currently
broken in this way.

    Mixing and Matching

    After memory has been obtained via one of these functions, it
    should be resized and freed only by a function from the same line,
    except that PyMem_Free may be used freely in place of PyMem_Del,
    and PyObject_Free in place of PyObject_Del.

Maintaining the Free/Del pseudo-distinction is pointless.

    For backward compatibility, memory obtained via the object memory
    family can be freed by any of Py{Mem, Object}_{Free, FREE, Del, DEL}.
    Mixing functions from the object family with the raw memory family
    is likely to become deprecated,

    Memory obtained by PyMem_{Malloc, MALLOC} shall be resized only
    by PyMem_{Realloc, REALLOC}.

    Memory obtained by PyMem_{New, NEW} shall be resized only
    by PyMem_{Resize, RESIZE}.

    Memory obtained by PyObject_{Malloc, MALLOC} shall be resized
    only by PyObject_{Realloc, REALLOC}.

Note that the eight ways to spell "free" all have to map to the pymalloc
free when pymalloc is enabled in 2.3.  There is no way to spell "give me the
raw platform free(), damn it", except for "free".  If we think it's
important to have a such a way in the core, it should be added to the
private API.

PyObject_{Malloc, MALLOC, Realloc, REALLOC} should make the same guarantees
about 0 arguments as PyMem_xyz make.  I'll have to dig into that, and, e.g.,
obmalloc.c's realloc currently goes out of its way *not* to treat 0 the way
the Python docs promise PyMem_Realloc works (I had already added an XXX
comment to the code there, but have not yet repaired it -- mostly because
"the rules" were still up in the air).

    Relationship to Platform Allocator

    All names in lines A, B, C and D ultimately invoke the platform
    C malloc, realloc, or free.  However, programs shall not mix any
    of these names with direct calls to the platform malloc, calloc,
    realloc, or free referencing the same base memory addresses, as
    Python may need to perform bookkeeping of its own around its calls
    to the platform allocator.

We're not promising that it's going to be easy to play with some other
implementation of malloc/realloc/free.  We get one person per year who tries
that, and they're usually trying a package that (sensibly enough) supplies
its own macro replacements for the tokens "free", "malloc" and "realloc"
specifically.  For that reason, having our own macros like
PyYAGNI_InRealLifeThisAlwaysExpandsToFree is pretty pointless; if you want
to use some other malloc/realloc/free package, recompile everything using
that package's header files.

    Note that all names in E, F, G and H invoke the Python object
    allocator.  The Python object allocator may or may not invoke the
    platform allocator, and the object allocator may vary depending on
    configuration options.

We don't promise anything about the internals of the object allocator.

    Threading Rules

    Programs shall hold the global interpreter lock before invoking
    a function in the object allocator (E, F, G, H).  Programs
    need not, but may, hold the global interpreter lock before
    invoking a function in the raw memory allocator (A, B, C, D),
    except that a program shall hold the global interpreter lock
    before invoking PyMem_{Free, FREE, Del, DEL} on memory originally
    obtained from the object allocator (but note that a program should
    not mix calls this way).

The conditions under which the GIL need not be held need also to be
documented in the thread/GIL docs.

    Use as Function Designators

    Programs shall not use use a name from line B, C, D, F, G or H
    in a C context requiring a function designator.  Names from lines
    A and E may be used in C contexts requiring a function designator.
    A function designator is a C expression having function type;
    C contexts requiring a function designator include as an actual
    argument to a function taking a function argument.

This implies the A and E names have to expand to function names (not
function applications, unless we're crazy).  It wouldn't make sense to
*require* this of "macro names".  Many of the other "function names" *have*
to be implemented as macros expanding to non-function-designator
expressions, because they're specified to return a result of a type given by
one of their arguments, and you just can't spell that in C without a macro
to cast to the required type.

    Recommended Practice

    When backward compatibility is not a concern, using this subset of
    the raw memory and object memory APIs is recommended:

    PyMem_{Malloc, Realloc, Free}
    PyObject_{Malloc, Realloc, Free}
    PyObject_{New, NewVar}

    Always use PyMem functions on memory obtained from a PyMem function,
    and likewise for PyObject.

    Uses of PyMem_{New, Resize} are probably clearer if you do the
    multiplication and call the platform malloc/realloc yourself
    (although be aware that platform malloc/realloc vary in how they
    handle 0 arguments).  Their existence in the Python C API dates
    back to a time when object allocation was handled in a different
    way.

    There's no reason not to use PyObject_Del, but it's the same as
    PyObject_Free, and using PyObject_Free instead should remind
    you that it does nothing except release the memory.