[Python-Dev] Proposed: a new function-based C API for declaring Python types

Gregory P. Smith greg at krypto.org
Fri May 1 00:44:12 CEST 2009


On Tue, Apr 28, 2009 at 8:03 PM, Larry Hastings <larry at hastings.org> wrote:

>
> EXECUTIVE SUMMARY
>
> I've written a patch against py3k trunk creating a new function-based
> API for creating  extension types in C.  This allows PyTypeObject to
> become a (mostly) private structure.
>
>
> THE PROBLEM
>
> Here's how you create an extension type using the current API.
>
>  * First, find some code that already has a working type declaration.
>   Copy and paste their fifty-line PyTypeObject declaration, then
>   hack it up until it looks like what you need.
>
>  * Next--hey!  There *is* no next, you're done.  You can immediately
>   create an object using your type and pass it into the Python
>   interpreter and it would work fine.  You are encouraged to call
>   PyType_Ready(), but this isn't required and it's often skipped.
>
> This approach causes two problems.
>
>  1) The Python interpreter *must support* and *cannot change*
>    the PyTypeObject structure, forever.  Any meaningful change to
>    the structure will break every extension.   This has many
>    consequences:
>      a) Fields that are no longer used must be left in place,
>         forever, as ignored placeholders if need be.  Py3k cleaned
>     up a lot of these, but it's already picked up a new one
>     ("tp_compare" is now "tp_reserved").
>      b) Internal implementation details of the type system must
>         be public.
>      c) The interpreter can't even use a different structure
>         internally, because extensions are free to pass in objects
>     using PyTypeObjects the interpreter has never seen before.
>
>  2) As a programming interface this lacks a certain gentility.  It
>    clearly *works*, but it requires programmers to copy and paste
>    with a large structure mostly containing NULLs, which they must
>    pick carefully through to change just a few fields.
>
>
> THE SOLUTION
>
> My patch creates a new function-based extension type definition API.
> You create a type by calling PyType_New(), then call various accessor
> functions on the type (PyType_SetString and the like), and when your
> type has been completely populated you must call PyType_Activate()
> to enable it for use.
>
> With this API available, extension authors no longer need to directly
> see the innards of the PyTypeObject structure.  Well, most of the
> fields anyway.  There are a few shortcut macros in CPython that need
> to continue working for performance reasons, so the "tp_flags" and
> "tp_dealloc" fields need to remain publically visible.
>
> One feature worth mentioning is that the API is type-safe.  Many such
> APIs would have had one generic "PyType_SetPointer", taking an
> identifier for the field and a void * for its value, but this would
> have lost type safety.  Another approach would have been to have one
> accessor per field ("PyType_SetAddFunction"), but this would have
> exploded the number of functions in the API.  My API splits the
> difference: each distinct *type* has its own set of accessors
> ("PyType_GetSSizeT") which takes an identifier specifying which
> field you wish to get or set.
>
>
> SIDE-EFFECTS OF THE API
>
> The major change resulting from this API: all PyTypeObjects must now
> be *pointers* rather than static instances.  For example, the external
> declaration of PyType_Type itself changes from this:
>   PyAPI_DATA(PyTypeObject) PyType_Type;
> to this:
>   PyAPI_DATA(PyTypeObject *) PyType_Type;
>
> This gives rise to the first headache caused by the API: type casts
> on type objects.  It took me a day and a half to realize that this,
> from Modules/_weakref.c:
>       PyModule_AddObject(m, "ref",
>                          (PyObject *) &_PyWeakref_RefType);
> really needed to be this:
>       PyModule_AddObject(m, "ref",
>                          (PyObject *) _PyWeakref_RefType);
>
> Hopefully I've already found most of these in CPython itself, but
> this sort of code surely lurks in extensions yet to be touched.
>
> (Pro-tip: if you're working with this patch, and you see a crash,
> and gdb shows you something like this at the top of the stack:
>   #0  0x081056d8 in visit_decref (op=0x8247aa0, data=0x0)
>                  at Modules/gcmodule.c:323
>   323             if (PyObject_IS_GC(op)) {
> your problem is an errant &, likely on a type object you're passing
> in to the interpreter.  Think--what did you touch recently?  Or debug
> it by salting your code with calls to collect(NUM_GENERATIONS-1).)
>
>
> Another irksome side-effect of the API: because of "tp_flags" and
> "tp_dealloc", I now have two declarations of PyTypeObject.  There's
> the externally-visible one in Include/object.h, which lets external
> parties see "tp_dealloc" and "tp_flags".  Then there's the internal
> one in Objects/typeprivate.h which is the real structure.  Since
> declaring a type twice is a no-no, the external one is gated on
>   #ifndef PY_TYPEPRIVATE
> If you're a normal Python extension programmer, you'd include Python.h
> as normal:
>   #include "Python.h"
> Python implementation files that need to see the real PyTypeObject
> structure now look like this:
>   #define PY_TYPEPRIVATE
>   #include "Python.h"
>   #include "../Objects/typeprivate.h"
>
> Also, since the structure of PyTypeObject hasn't yet changed, there
> are a bunch of fields in PyTypeObject that are externally visible that
> I don't want to be visible.  To ensure no one was using them, I renamed
> them to "mysterious_object_0" and "mysterious_object_1" and the like.
> Before this patch gets accepted, I want to reorder the fields in
> PyTypeObject (which we can! because it's private!) so that these public
> fields are at the top of the both the external and internal structures.
>
>
> THE UPGRADE PATH
>
> Python internally declares a great many types, and I haven't attempted
> to convert them all.  Instead there's an conversion header file that
> does most of the work for you.  Here's how one would apply it to an
> existing type.
>
> 1. Where your file currently has this:
>   #include "Python.h"
>  change it to this:
>   #define PY_TYPEPRIVATE
>   #include "Python.h"
>   #include "pytypeconvert.h"
>
> 2. Whenever you declare a type, change it from this:
>   static PyTypeObject YourExtension_Type = {
>  to this:
>   static PyTypeObject *YourExtension_Type;
>   static PyTypeObject _YourExtension_Type = {
>
>  Use NULL for your metaclass.  For example, change this:
>   PyObject_HEAD_INIT(&PyType_Type),
>  to this:
>   PyObject_HEAD_INIT(NULL),
>
>  Also use NULL for your baseclass.  For example, change this:
>   &PyDict_Type, /* tp_base */
>  to this:
>   NULL, /* tp_base */
>  setting it to NULL instead.
>
> 3. In your module's init function, add this:
>   CONVERT_TYPE(YourExtension_Type,
>       metaclass, baseclass, "description of type");
>  "metaclass" and "baseclass" should be the metaclass and baseclass
>  for your type, the ones you just set to NULL in step 3.  If you
>  had NULL before the baseclass, use NULL here too.
>
> 4. If you have any static object declarations, set their ob_type to
>  NULL in the static declaration, then set it explicitly in your
>  init function.  If your object uses a locally-defined type,
>  be sure to do this *after* the CONVERT_TYPE line for that type.
>  (See _Py_EllipsisObject for an example.)
>
> 5. Anywhere you're using existing Python type declarations
>  you must remove the & from the front.
>
> The conversion header file *also* redefines PyTypeObject.  But this
> time it redefines it to the existing definition, and that definition
> will stay the same forever.  That's the whole point: if you have an
> existing Python 3.0 extension, it won't have to change if we change
> the internal definition of PyTypeObject.
>
> (Why bother with this conversion process, with few py3k extensions
> in the wild?  This patch was started quite a while ago, when it
> seemed plausible the API would get backported to 2.x.  Now I'm not
> so sure that will happen.)
>
>
>
>
> THE CURRENT PATCH
>
> I've uploaded a patch to the tracker:
>   http://bugs.python.org/issue5872
> It applies cleanly to py3k/trunk (r72081).  But the code is awfully
> grubby.
>
> * I haven't dealt with any types I can't build, and I can't build
>  a lot of the extensions.  I'm using Linux, and I don't have the
>  dev headers for many libraries on my laptop, and I haven't touched
>  Windows or Mac stuff.
>
> * I created some new build warnings which should obviously be fixed.
>
> * With the patch installed, py3k trunk builds and installs.  It does
>  *not* pass the regression test suite.  (It crashes.)  I don't think
>  this'll be too bad, it's just taken me this long to get it as far
>  as I have.
>
> * There are some internal scaffolds and hacks that should be purged
>  by the final patch.
>
> * There's no documentation.  If you'd like to see how you'd use the
>  new API, currently the best way to learn is to read
>  Include/pytypeconvert.h.
>
> * I don't like the PY_TYPEPRIVATE hack.  I only used it 'cause it
>  sucks less than the other approaches I've thought of.  I welcome
>  your suggestions.
>
>  The second-best approach I've come up with: make PyTypeObject
>  genuinely private, and declare a different structure containing just
>  the head of PyTypeObject.   Let's call it PyTypeObjectHead.  Then,
>  for the convenience macros that use "dealloc" and "flags", cast the
>  object to PyTypeObjectHead before dereferencing.  This abandons type
>  safety, and given my longing for type safety while developing this
>  patch I'd prefer to not make loss of type safety an official API.
>
> THE FEEDBACK I SEEK
>
> My understanding is that the feature-freeze for Python 3.1 is in a
> little over a week.  Given the current stability level and untestedness
> of the patch, and the lateness of the hour... is there any chance this
> would be accepted into Python 3.1?  If so, I'll need to act fast.  If
> not, I might as well take it relax, huh.
>
>
> My thanks to Neal Norwitz for suggesting this project, and Brett Cannon
> for some recent encouragement.  (And another person who I discussed it
> with so long ago I forgot who it was... maybe Fredik Lundh?)
>
>
> /larry/


+1

I haven't looked at your code so I can't comment on the API itself... But
awesome.  I like the general idea.  Exposing structures has hampered us for
quite a while with forwards API compatability.

I predict not enough people are available to drive this to adoption and use
for Python 3.1 given the time frame (the beta feature freeze happens this
Saturday I believe?) but we should make this happen for 3.2 and get it
stable and into in trunk soon after release-31maint branch is created.

Whats needed?  Perhaps a PEP describing a lot of what you started to write
up in this email: the new extension module API with sections on the upgrade
path and backwards compatibillity story.

Extension modules are often maintained such that they work on all versions
of Python from 2.3 or 2.4 on up to 3.x.  We should provide a decent way to
do that.  Could some of these API functions be provided as a rarely changing
add on .c/.h file for extension module authors to bundle as part of their
extension modules for use with older versions of python to avoid big #ifdefs
around structure definitions vs initialization API calls?

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090430/f69d8a0f/attachment-0001.htm>


More information about the Python-Dev mailing list