Guido and I sat down and hashed this out Tuesday afternoon. Here's what I think Revealed Truth was. Indented text is intended to show up in the docs (after Fred reworks for sanity). Unindented text (like this) is commentary. Four technical words the docs really need are used freely: shall - mandatory; all bets are off if you don't shall not - forbidden; all bets are off if you do should - recommended; may become mandatory later should not - discouraged; may become forbidden later Nothing is deprecated for 2.3. The layers of PyCore_XYZ macros Neil already nuked are plain gone, and weren't documented anyway. Nothing has been added to the public API (wrt 2.2 as a base). These are the functions in the "raw memory" and "object memory" APIs: The raw memory API: A. PyMem_{Malloc, Realloc, Free} B. PyMem_{MALLOC, REALLOC, FREE} C. PyMem_{New, Resize, Del} D. PyMem_{NEW, RESIZE, DEL} The object memory API: E. PyObject_{Malloc, Realloc, Free} F. PyObject_{MALLOC, REALLOC, FREE} G. PyObject_{New, NewVar, Del} H. PyObject_{NEW, NEWVAR, DEL} Guido suggested leaving the PyObject_GC_xyz family out of this discussion, as it's clean already, self-contained, can't mix with *any* of the guys above, and has no compatibility nightmares to contend with (bless you, Neil). Function versus Macro Names Names ending with a mixed-case suffix are called "function names" (lines A, C, E, G). Names ending with an all-uppercase suffix are called "macro names" (lines B, D, F, H). This is for convenience in description, and does not necessarily reflect how a given name is implemented in C. Programs should use function names. Programs should not use macro names. Uses of function names are guaranteed binary compatible across releases; no such guarantee is given for uses of macro names. Yes, this means that almost all old code is now considered to be in bad style, and you're on official notice that the macro names *may* be deprecated down the road. I don't expect this to happen before 1.5.2 is no longer of significant interest. "Programs" does not include the Python core. We can do anything we need to do in the core, based on our complete <heh> knowledge of each release's internal implementation details. Programs currently using a macro name should switch to the corresponding function name (for example, replace PyMem_FREE with PyMem_Free). In 2.3 and beyond, it's guaranteed that corresponding function and macro names have the same input-output behavior. Before 2.3, PyMem_MALLOC and PyMem_REALLOC did not define what happened when passed a 0 argument. Programs should not assume that macro name spellings "are faster" than function name spellings. In most cases they aren't, and the cases in which they are may vary across releases. Macro names are retained only for backward compatibility. We need to change PyMem_MALLOC and PyMem_REALLOC for 2.3. It's going to be impossible for people to keep the rules straight if corresponding function and macro versions ever have different basic semantics. Since the macro names are not yet deprecated, they need to be fixed where they're currently broken in this way. Mixing and Matching After memory has been obtained via one of these functions, it should be resized and freed only by a function from the same line, except that PyMem_Free may be used freely in place of PyMem_Del, and PyObject_Free in place of PyObject_Del. Maintaining the Free/Del pseudo-distinction is pointless. For backward compatibility, memory obtained via the object memory family can be freed by any of Py{Mem, Object}_{Free, FREE, Del, DEL}. Mixing functions from the object family with the raw memory family is likely to become deprecated, Memory obtained by PyMem_{Malloc, MALLOC} shall be resized only by PyMem_{Realloc, REALLOC}. Memory obtained by PyMem_{New, NEW} shall be resized only by PyMem_{Resize, RESIZE}. Memory obtained by PyObject_{Malloc, MALLOC} shall be resized only by PyObject_{Realloc, REALLOC}. Note that the eight ways to spell "free" all have to map to the pymalloc free when pymalloc is enabled in 2.3. There is no way to spell "give me the raw platform free(), damn it", except for "free". If we think it's important to have a such a way in the core, it should be added to the private API. PyObject_{Malloc, MALLOC, Realloc, REALLOC} should make the same guarantees about 0 arguments as PyMem_xyz make. I'll have to dig into that, and, e.g., obmalloc.c's realloc currently goes out of its way *not* to treat 0 the way the Python docs promise PyMem_Realloc works (I had already added an XXX comment to the code there, but have not yet repaired it -- mostly because "the rules" were still up in the air). Relationship to Platform Allocator All names in lines A, B, C and D ultimately invoke the platform C malloc, realloc, or free. However, programs shall not mix any of these names with direct calls to the platform malloc, calloc, realloc, or free referencing the same base memory addresses, as Python may need to perform bookkeeping of its own around its calls to the platform allocator. We're not promising that it's going to be easy to play with some other implementation of malloc/realloc/free. We get one person per year who tries that, and they're usually trying a package that (sensibly enough) supplies its own macro replacements for the tokens "free", "malloc" and "realloc" specifically. For that reason, having our own macros like PyYAGNI_InRealLifeThisAlwaysExpandsToFree is pretty pointless; if you want to use some other malloc/realloc/free package, recompile everything using that package's header files. Note that all names in E, F, G and H invoke the Python object allocator. The Python object allocator may or may not invoke the platform allocator, and the object allocator may vary depending on configuration options. We don't promise anything about the internals of the object allocator. Threading Rules Programs shall hold the global interpreter lock before invoking a function in the object allocator (E, F, G, H). Programs need not, but may, hold the global interpreter lock before invoking a function in the raw memory allocator (A, B, C, D), except that a program shall hold the global interpreter lock before invoking PyMem_{Free, FREE, Del, DEL} on memory originally obtained from the object allocator (but note that a program should not mix calls this way). The conditions under which the GIL need not be held need also to be documented in the thread/GIL docs. Use as Function Designators Programs shall not use use a name from line B, C, D, F, G or H in a C context requiring a function designator. Names from lines A and E may be used in C contexts requiring a function designator. A function designator is a C expression having function type; C contexts requiring a function designator include as an actual argument to a function taking a function argument. This implies the A and E names have to expand to function names (not function applications, unless we're crazy). It wouldn't make sense to *require* this of "macro names". Many of the other "function names" *have* to be implemented as macros expanding to non-function-designator expressions, because they're specified to return a result of a type given by one of their arguments, and you just can't spell that in C without a macro to cast to the required type. Recommended Practice When backward compatibility is not a concern, using this subset of the raw memory and object memory APIs is recommended: PyMem_{Malloc, Realloc, Free} PyObject_{Malloc, Realloc, Free} PyObject_{New, NewVar} Always use PyMem functions on memory obtained from a PyMem function, and likewise for PyObject. Uses of PyMem_{New, Resize} are probably clearer if you do the multiplication and call the platform malloc/realloc yourself (although be aware that platform malloc/realloc vary in how they handle 0 arguments). Their existence in the Python C API dates back to a time when object allocation was handled in a different way. There's no reason not to use PyObject_Del, but it's the same as PyObject_Free, and using PyObject_Free instead should remind you that it does nothing except release the memory.
Tim> Guido and I sat down and hashed this out Tuesday afternoon. Here's Tim> what I think Revealed Truth was. Hallelujah! Even I understood it. I can't imagine adapting to this scheme should be difficult. The only thing that threw me a little was the alphabet soup in Use as Function Designators Programs shall not use use a name from line B, C, D, F, G or H in a C context requiring a function designator. Names from lines A and E may be used in C contexts requiring a function designator. A function designator is a C expression having function type; C contexts requiring a function designator include as an actual argument to a function taking a function argument. This just means that PyMem_{Malloc, Realloc, Free} and PyObject_{Malloc, Realloc, Free} will be implemented as functions or simple macros that expand to a single function call, and that the others can be all sorts of CPP gobbledygook, right? Skip
Use as Function Designators
Programs shall not use use a name from line B, C, D, F, G or H in a C context requiring a function designator. Names from lines A and E may be used in C contexts requiring a function designator. A function designator is a C expression having function type; C contexts requiring a function designator include as an actual argument to a function taking a function argument.
[Skip Montanaro]
This just means that PyMem_{Malloc, Realloc, Free} and PyObject_{Malloc, Realloc, Free} will be implemented as functions or simple macros that expand to a single function call, and that the others can be all sorts of CPP gobbledygook, right?
That was my intent, but it's hard to predict how courts will rule <wink>.
The only thing that threw me a little was the alphabet soup
Fred needs to work with me to make the real docs comprehensible; all this stuff is on top of what's already said in the docs, and I believe rearrangement is called for (like combining the scattered sections about the memory APIs, for a start).
Tim Peters wrote:
Guido and I sat down and hashed this out Tuesday afternoon. Here's what I think Revealed Truth was.
Very good. Spelling out the rules was long overdue.
Note that the eight ways to spell "free" all have to map to the pymalloc free when pymalloc is enabled in 2.3. There is no way to spell "give me the raw platform free(), damn it", except for "free".
I guess we need to do something special for pymalloc free() if PYMALLOC_DEBUG is defined.
We don't promise anything about the internals of the object allocator.
Good. Someone might write platform specific versions that use mmap or something.
Recommended Practice
When backward compatibility is not a concern, using this subset of the raw memory and object memory APIs is recommended:
PyMem_{Malloc, Realloc, Free} PyObject_{Malloc, Realloc, Free} PyObject_{New, NewVar}
This should probably come up front in the documentation. All the rest is historical crap. :-) Neil
[Neil Schemenauer]
Very good. Spelling out the rules was long overdue.
That still needs to be done for PyObject_GC_xyz. "Shall not mix", "shall hold the GIL", "guaranteed binary compatibility", and "shall not use in contexts requiring function designators" are the no-brainer answers. Any objections there?
Note that the eight ways to spell "free" all have to map to the pymalloc free when pymalloc is enabled in 2.3. There is no way to spell "give me the raw platform free(), damn it", except for "free".
I guess we need to do something special for pymalloc free() if PYMALLOC_DEBUG is defined.
I expect the simplest is to redirect all the "free" spellings to _PyMalloc_DebugFree then; then I'll have to change that function to pass the address on to system free() or _PyMalloc_Free(), depending on where the address came from. An alternative with some real attractions is to redirect all "get memory" spellings to _PyMalloc_DebugMalloc in PYMALLOC_DEBUG mode, in which case I'd need to change _PyMalloc_DebugMalloc to serialize via a lock, and change _PyMalloc_DebugFree to serialize too and just *verify* that all addresses passed to it came from _PyMalloc_Debug{Malloc, Realloc}.
Recommended Practice ...
This should probably come up front in the documentation. All the rest is historical crap. :-)
I agree, but this is Fred's call in the end. In some ways the memory API docs need to be more precise, but in others they need to give succinct friendly advice about the least-hassle way to proceed. Alas, the compatability maze is so large it's bound to dominate the docs no matter how it's rearranged.
Tim Peters wrote:
That still needs to be done for PyObject_GC_xyz. "Shall not mix", "shall hold the GIL", "guaranteed binary compatibility",
Yes.
"shall not use in contexts requiring function designators"
People need some form of GC free to initialize tp_free. Extensions can't really use _PyObject_GC_Del, can they? Neil
[Tim]
That still needs to be done for PyObject_GC_xyz. "Shall not mix", "shall hold the GIL", "guaranteed binary compatibility",
[Neil, the PyObject_GC Master]
Yes.
"shall not use in contexts requiring function designators"
People need some form of GC free to initialize tp_free. Extensions can't really use _PyObject_GC_Del, can they?
No they can't. So we promise that PyObject_GC_Del can be used in contexts requiring function designators. We're sure that PyObject_GC_{New, NewVar, Resize} can't, since they cast their results to a type "passed" as an argument. Harmonic convergence?
[Neil Schemenauer]
People will have to cast to PyObject* when calling PyObject_GC_Del.
This depends on whether you want to leave its signature alone, or declare it as taking a void*.
I guess that will be consistent with the other _Free and _Del functions.
One reason I prefer the other _Free functions over their _Del versions is that they're already declared to take void*, just like C free(); e.g., extern DL_IMPORT(void) PyObject_Free(void *); Some of our macros do redundant casts to void* now when invoking these guys. In any case, if people stick to the "recommended" API {PyMem, PyObject)_Free, they should not need to cast their arguments. Note that there's no type safety in the way we've actually implemented things even for the PyObject_Del spelling: #define PyObject_Del(op) _PyObject_Del((PyObject *)(op)) That is, no matter what kind of goofy pointer a programmer may pass, we silently cast it to PyObject* anyway. Better then (IMO) to change _PyObject_Del's signature to void*, let *it* cast to PyObject* internally, and lose the macro trick.
"Programs" does not include the Python core. We can do anything we need to do in the core, based on our complete <heh> knowledge of each release's internal implementation details.
However, since many people use the core as an example, we may have to be careful here. Certainly xxmodule.c, xxobject.c and xxsubtype.c should do the recommended thing; but if too many of the built-in objects use the macros despite those being in danger of deprecation, that might still perpetuate use of those macros.
Mixing and Matching
After memory has been obtained via one of these functions, it should be resized and freed only by a function from the same line, except that PyMem_Free may be used freely in place of PyMem_Del, and PyObject_Free in place of PyObject_Del.
I actually had a slightly different rule in mind: mixing and matching between two adjacent lines that only differ by case is also permitted. Except of course that the macro lines will be deprecated, so maybe it doesn't matter to spell this out.
Maintaining the Free/Del pseudo-distinction is pointless.
For backward compatibility, memory obtained via the object memory family can be freed by any of Py{Mem, Object}_{Free, FREE, Del, DEL}. Mixing functions from the object family with the raw memory family is likely to become deprecated,
Memory obtained by PyMem_{Malloc, MALLOC} shall be resized only by PyMem_{Realloc, REALLOC}.
Memory obtained by PyMem_{New, NEW} shall be resized only by PyMem_{Resize, RESIZE}.
Memory obtained by PyObject_{Malloc, MALLOC} shall be resized only by PyObject_{Realloc, REALLOC}.
Note that the eight ways to spell "free" all have to map to the pymalloc free when pymalloc is enabled in 2.3. There is no way to spell "give me the raw platform free(), damn it", except for "free". If we think it's important to have a such a way in the core, it should be added to the private API.
IMO, PyMem_Malloc() and PyMem_Free() should be good enough.
Relationship to Platform Allocator
All names in lines A, B, C and D ultimately invoke the platform C malloc, realloc, or free. However, programs shall not mix any of these names with direct calls to the platform malloc, calloc, realloc, or free referencing the same base memory addresses, as Python may need to perform bookkeeping of its own around its calls to the platform allocator.
Really? Why not just say these are wrappers around malloc and free? (On platforms where it matters, they will be guaranteed to use the same heap as the core uses -- this is apparently an issue with Windows DLLs.) --Guido van Rossum (home page: http://www.python.org/~guido/)
"Programs" does not include the Python core. We can do anything we need to do in the core, based on our complete <heh> knowledge of each release's internal implementation details.
[Guido]
However, since many people use the core as an example, we may have to be careful here. Certainly xxmodule.c, xxobject.c and xxsubtype.c should do the recommended thing;
I'll make sure that they do.
but if too many of the built-in objects use the macros despite those being in danger of deprecation, that might still perpetuate use of those macros.
It's hard to imagine that the macro names will often be cheaper than the function names, given the new requirement that they have identical I/O behavior. So, yes, I expect a lot of mindless source edits to replace core use of macro names with function names (and I'm good at massive mindless source edits <wink>). There may be still be cases where the core wants to exploit that, e.g., it *knows* a malloc argument is > 0, and so skip the _PyMem_EXTRA business; if so, I expect we'll grow an internal API spelling for that.
After memory has been obtained via one of these functions, it should be resized and freed only by a function from the same line, except that PyMem_Free may be used freely in place of PyMem_Del, and PyObject_Free in place of PyObject_Del.
I actually had a slightly different rule in mind:
I did too at first.
mixing and matching between two adjacent lines that only differ by case is also permitted. Except of course that the macro lines will be deprecated, so maybe it doesn't matter to spell this out.
Exactly so. Since programs should not use macro names at all anymore, text devoted to drawing fine distinctions among macro name use is pretty much a waste. Note there is no "shall" or "shall not" text *forbidding* mixing corresponding macro and function names: the text above is only "should be ... from the same line". The later blurbs of the form Memory obtained by PyMem_{Malloc, MALLOC} shall be resized only by PyMem_{Realloc, REALLOC}. are *not* intended not to be read as forbidding mixing PyMem_Malloc with PyMem_REALLOC. Perhaps that would be clearer said Memory obtained by either of PyMem_{Malloc, MALLOC} shall be resized only by either of PyMem_{Realloc, REALLOC}. The exhaustive tables I posted earlier are much clearer for stuff like this, but, again, I'm decreasingly keen to spell out elaborate details for spellings that are discouraged. [on whether private-API spellings for raw malloc etc should be introduced]
IMO, PyMem_Malloc() and PyMem_Free() should be good enough.
I was expecting this would be an objective matter resolved by exhaustive study of the code base when renaming the macro uses. I'm happy to settle it arbitrarily, though <wink>.
Relationship to Platform Allocator
All names in lines A, B, C and D ultimately invoke the platform C malloc, realloc, or free. However, programs shall not mix any of these names with direct calls to the platform malloc, calloc, realloc, or free referencing the same base memory addresses, as Python may need to perform bookkeeping of its own around its calls to the platform allocator.
Really? Why not just say these are wrappers around malloc and free?
I'm specifically trying to leave the door open for the PyMalloc_DebugXYZ routines to capture all uses of memory API functions in PYMALLOC_DEBUG mode. It also reads better as-is than to say, well, ya, OK, I suppose you *can* mix raw calls to libc with these particular memory API families, but the PyObject_GC_xyz family is entirely different in this respect. I don't want to draw distinctions just because the current implementation makes it possible to draw them; users shall <wink> consider malloc/realloc/free as a distinct API.
(On platforms where it matters, they will be guaranteed to use the same heap as the core uses -- this is apparently an issue with Windows DLLs.)
But that's another reason (if it's real <wink>) to say "no mixing" here: if this is a use they care about, then, e.g., they're dead meat if they mix PyMem_Malloc() with raw free() (the former would use the malloc in the Python DLL, the latter the free in the user's DLL).
participants (4)
-
Guido van Rossum
-
Neil Schemenauer
-
Skip Montanaro
-
Tim Peters