Extending types in C - help needed

In the discussion on my request for an ("O@", typeobject, void **) format for PyArg_Parse and Py_BuildValue MAL suggested that I could get the same functionality by creating a type WrapperTypeObject, which would be a subtype of TypeObject with extra fields pointing to the _New() and _Convert() routines to convert Python objects from/to C pointers. This would be good enough for me, because then types wanting to participate in the wrapper protocol would subtype WrapperTypeObject in stead of TypeObject, and two global routines could return the _New and _Convert routines given the type object, and we wouldn't need yet another PyArg_Parse format specifier. However, after digging high and low I haven't been able to deduce how I would then use this WrapperType in C as the type for my extension module objects. Are there any examples? If not, could someone who understands the new inheritance scheme give me some clues as to how to do this? -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -

Jack Jansen wrote:
In the discussion on my request for an ("O@", typeobject, void **) format for PyArg_Parse and Py_BuildValue MAL suggested
Thomas Heller suggested this. I am more in favour of exposing the pickle reduce API through "O@", that is have PyArgTuple_Parse() call the .__reduce__() method of the object. This will then return (factory, state_tuple) and these could then be exposed to the C function via two PyObject*. Note that there's no need for any type object magic. If this becomes a common case, it may be worthwhile to add a tp_reduce slot to type objects though. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

On Thursday, January 17, 2002, at 11:29 AM, M.-A. Lemburg wrote:
Oops, you're right. I should be careful not to mix up my Germans;-)
You've suggested this before, but at that time I ignored it because it made absolutely no sense to me. "pickle" triggers one set of ideas for me, "reduce" triggers a different set, "factory function" yet another different set. None of these sets of ideas have the least resemblance to what I'm trying to do:-) I gave a fairly complete example (using calldll from Python to wrap a function that returns a Mac WindowObject) last week, could you explain how you would implement this with pickle, reduce and factory functions? -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -

Jack Jansen wrote:
The idea is simple but extends what you are trying to achieve (I gave an example on how to use this somewhere in the "wrapper" thread). Basically, you'll just want to use the state tuple to access the underlying void* C pointer via a PyCObject which does the wrapping of the pointer. The "pickle" mechanism would store the PyCObject in the state tuple which you could then access to get at the C pointer. This may sound complicated at first, but it provides much more flexibility w/r to more complex objects, e.g. the method you have in mind only supports wrapping a single C pointer; the "pickle" mechanism can potentially handle any serializable object.
Sorry, no time for that ... I've got an important business trip next week which needs to be prepared. Please bring this up again after next week. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

On Friday, January 18, 2002, at 10:47 , M.-A. Lemburg wrote:
I think you're missing a few points here. First of all, my objects aren't PyCObjects but other extension objects. While the main pointer in the object could be wrapped in a PyCObject there may be other information in my objects that is important, such as a pointer to the dispose routine to call on the c-pointer when the Python object reaches refcount zero (and this pointer may change over time as ownership of, say, a button is passed from Python to the system). The _New and _Convert routines will know how to get from the C pointer to the *correct* object, i.e. normally there will be only one Python object for every C object. Also, the method seems rather complicated for doing a simple thing. The only thing I really want is a way to refer to an _New or _Convert method from Python code. The most reasonable way to do that seems to be by creating a way to get from te type object (which is available in Python) to those routines. Thomas' suggestion looked very promising, and simple too, until Guido said that unfortunately it couldn't be done. Your suggestion, as far as I understand it, looks complicated and probably inefficient too (remember the code will have to go through all these hoops every time it needs to convert an object from Python to C or vice versa). Correct me if I'm wrong,

Jack Jansen wrote:
I know. The idea is that either you add a .__reduce__ method to the extension objects or register their types with a registry comparable to copyreg.
Note that PyCObjects support all of this. It's not important in this context, though. The PyCObject is only used to wrap the raw pointer; the factory function then takes this pointer and creates one of your extension object out of it.
That's also possible using the "pickle" approach.
It is more complicated, but also more flexible. Plus it builds on techniques which are already applied in Python's pickle mechanism. Note that by adding a tp_reduce slot, the overhead of calling a Python function could be kept reasonable. Helper functions could aid in accessing the C pointer which is stored in the state tuple. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

I believe the attached code implements your requirements. In particular, see PyArg_GenericCopy for an application that extracts a void* from an object through a type-safe protocol, then creates a clone of the original object through the same protocol. Both extractor and creator function are associated with the type object. To see this work in Python, run
Regards, Martin #include "Python.h" /************* Generic Converters ***************/ struct converters{ PyObject* (*create)(void*); int (*extract)(PyObject*, void**); }; char descr_string[] = "calldll converter structure"; void PyArg_AddConverters(PyTypeObject *type, struct converters* convs) { PyObject *cobj = PyCObject_FromVoidPtrAndDesc(convs, descr_string, NULL); PyDict_SetItemString(type->tp_dict, "__calldll__", cobj); Py_DECREF(cobj); } struct converters* PyArg_GetConverters(PyTypeObject *type) { PyObject *cobj; void *descr; cobj = PyObject_GetAttrString((PyObject*)type, "__calldll__"); if (!cobj) return NULL; descr = PyCObject_GetDesc(cobj); if (!descr) return NULL; if (descr != descr_string){ PyErr_SetString(PyExc_TypeError, "invalid cobj"); return NULL; } return (struct converters*)PyCObject_AsVoidPtr(cobj); } PyObject *PyArg_Create(PyTypeObject* type, void * value) { struct converters *convs = PyArg_GetConverters(type); if (!convs) return NULL; return convs->create(value); } int PyArg_Extract(PyObject* obj, void** value) { struct converters *convs = PyArg_GetConverters(obj->ob_type); if (!convs) return -1; convs->extract(obj, value); return 0; } PyObject* PyArg_GenericCopy(PyObject* obj) { void *tmp; if (PyArg_Extract(obj, &tmp)) return NULL; return PyArg_Create(obj->ob_type, tmp); } /************* End Generic Converters ***************/ typedef struct { PyObject_HEAD int handle; } HandleObject; staticforward PyTypeObject Handle_Type; #define HandleObject_Check(v) ((v)->ob_type == &Handle_Type) static HandleObject * newHandleObject(int i) { HandleObject *self; self = PyObject_New(HandleObject, &Handle_Type); if (self == NULL) return NULL; self->handle = i; return self; } /* Handle methods */ static void Handle_dealloc(HandleObject *self) { PyObject_Del(self); } /**************** Generic Converters: Handle support ***************/ static PyObject* handle_conv_new(void *s){ return (PyObject*)newHandleObject((int)s); } static int handle_conv_extract(PyObject *o, void **dest){ HandleObject *h = (HandleObject*)o; *dest = (void*)h->handle; return 0; } struct converters HandleConvs = { handle_conv_new, handle_conv_extract }; /**************** Generic Converters: Handle support ***************/ statichere PyTypeObject Handle_Type = { /* The ob_type field must be initialized in the module init function * to be portable to Windows without using C++. */ PyObject_HEAD_INIT(NULL) 0, /*ob_size*/ "handle.Handle", /*tp_name*/ sizeof(HandleObject), /*tp_basicsize*/ 0, /*tp_itemsize*/ /* methods */ (destructor)Handle_dealloc, /*tp_dealloc*/ 0, /*tp_print*/ 0, /*tp_getattr*/ 0, /*tp_setattr*/ 0, /*tp_compare*/ 0, /*tp_repr*/ 0, /*tp_as_number*/ 0, /*tp_as_sequence*/ 0, /*tp_as_mapping*/ 0, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ Py_TPFLAGS_DEFAULT, /*tp_flags*/ }; /* --------------------------------------------------------------------- */ static PyObject * xx_new(PyObject *self, PyObject *args) { HandleObject *rv; int h; if (!PyArg_ParseTuple(args, "i:new", &h)) return NULL; rv = newHandleObject(h); if ( rv == NULL ) return NULL; return (PyObject *)rv; } static PyObject * xx_copy(PyObject *self, PyObject *args) { PyObject *obj; if (!PyArg_ParseTuple(args, "O:copy", &obj)) return NULL; return PyArg_GenericCopy(obj); } static PyMethodDef xx_methods[] = { {"new", xx_new, METH_VARARGS}, {"copy", xx_copy, METH_VARARGS}, {NULL, NULL} /* sentinel */ }; DL_EXPORT(void) inithandle(void) { PyObject *m; Handle_Type.ob_type = &PyType_Type; PyType_Ready(&Handle_Type); PyArg_AddConverters(&Handle_Type, &HandleConvs); /* Create the module and add the functions */ m = Py_InitModule("handle", xx_methods); }

Indeed. I also think it is more appropriate than either a new metatype or a ParseTuple extension for the problem at hand (supporting arbitrary types in calldll), for the following reasons: - There may be different ways of how an object converts to a "native" type. In particular, in some cases, ParseTuple may need to return (fill out) something more complex than a void*, something that calldll cannot support by nature. - A type may need to provide various independent extensions to the standard protocols, e.g. it may provide "give me a Unicode doc string" in addition to "give me a conversion function to void*". In this case, you'd need multiple inheritance on the metatype level, something that does not reflect well in C. For Python, it is much more common not to care at all about inheritance. Instead, just access the protocol, and expect an exception if it is not supported. Also notice that this *does* make use of new-style classes: In 2.1, types did not have a tp_dict slot. Of course, the PyType_Ready call should go immediately before the place where tp_dict is accessed, and a check should be added whether tp_flags contains Py_TPFLAGS_HAVE_CLASS. Regards, Martin

Wouldn't it suffice to check for tp_dict != NULL (after the call to PyType_Ready of course)?
No, see below (although I must admit that I wrote "Right" here first :-)
Hm. What does Py_TPFLAGS_HAVE_CLASS mean exactly?
According to the documentation, it means that the underlying TypeObject structure has the necessary fields in its C declaration.
Or, better, since TPFLAGS_DEFAULT contains TPFLAGS_HAVE_CLASS, what does it mean when Py_TPFLAGS_HAVE_CLASS is NOT in tp_flags?
It means you have been loading a module from an earlier Python version, which had a different setting for TPFLAGS_DEFAULTS, and a shorter definition of the TypeObject. If you try to access tp_dict in such an object, you are accessing random memory. This may immediately crash, or only crash when you pass the pointer you got to the dictionary functions. Regards, Martin

On Friday, January 18, 2002, at 08:36 PM, Martin v. Loewis wrote: >> Also, the method seems rather complicated for doing a simple >> thing. The >> only thing I really want is a way to refer to an _New or >> _Convert method >> from Python code. > > I believe the attached code implements your requirements. > Martin, hats off! This does exactly what I want, and it does so in a pretty generalized way. Actually in _such_ a generalized way that I think this should be documented loud and clear. Looking at it a bit more, how about storing each function pointer in a separate PyCObject, and adding general APIs somewhere in the core void PyType_SetAnnotation(PyTypeObject *tp, char *name, char *descr, void *); void *PyType_GetAnnotation(PyTypeObject *tp, char *name, char *descr); (I've picked the name annotation here, because it sort-of feels like that, another name may bring the idea across better). > -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -

Thanks!
I'll happily add that to some recipe collection. However, before generalizing it, I'd like to see more use cases. There should, atleast, be a *second* application beyond calldll (or, perhaps even beyond MacPython). Generalizing from a single use case is not good. Regards, Martin

[Martin's PyCObject based Handle object] This seems to be very close to the __reduce__ idea I posted on this thread a couple of days ago. Why not extend it to fully support this standard Python protocol ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Because it is not clear, to me, what specifically the semantics of this protocol is. I wrote it to support MacOS calldll. I cannot see applicability beyond this API. One of the strength of OO and polymorphism is precisely that users can freely extend the protocols that their objects support, without requiring *all* objects to support the protocol. A standard protocol should be clearly useful cross-platform, for many different types, in different applications. Regards, Martin

From: "Jack Jansen" <Jack.Jansen@oratrix.nl>
Currently (after quite some time) I have the impression that you cannot create a subtype of PyType_Type in C because PyType_Type ends in a variable sized array, at least not in this way: struct { PyTypeObject type; ...additional fields... } WrapperType_Type; Can someone confirm this? (I have to find out what to do with the tp_members slot, which seems to be correspond to the Python level __slots__ class variable) Thomas

Yes, alas. The type you would have to declare is 'etype', a private type in typeobject.c. --Guido van Rossum (home page: http://www.python.org/~guido/)

I wish I had time to explain this, but I don't. For now, you'll have to read how types are initialized in typeobject.c -- maybe there's a way, maybe there isn't.
Any tips about the route to take?
It can be done easily dynamically. --Guido van Rossum (home page: http://www.python.org/~guido/)

From: "Guido van Rossum" <guido@python.org>
I'm still struggling with this. How can it be done dynamically? My idea would be to realloc() the object after creation, adding a few bytes at the end. The problem is that I don't know how to find out about the object size without knowledge about the internals. The formula given in PEP 253 type->tp_basicsize + nitems * type->tp_itemsize seems not to be valid any more (at least with CYCLE GC). Thomas

I have thought about this a little more and come to the conclusion that you cannot define a metaclass that creates type objects that have more C slots than the standard type object lay-out. It would be the same as trying to add a C slot to the instances of a string subtype: there's variable-length data at the end, and you cannot place anything *before* that variable-length data because all the C code that works with the base type knows where the variable length data start; you cannot place anything *after* that variable-lenth data because there's no way to address it from C. The only way around this would be to duplicate *all* the code of type_new(), which I don't recommend because it will probably have to be changed for each Python version (even for bugfix releases). A better solution is to store additional information in the __dict__. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wed, Feb 06, 2002 at 09:36:27AM -0500, Guido van Rossum wrote:
I had a half-baked idea when I read this. Is there something unworkable about the scheme, aside from being very different from the way Python currently operates? Has anybody written a system that works this way? Is it just plain gross? Jeff Epler jepler@inetnebr.com Half-Baked Idea --------------- The problem is that we have variable-length types. For example, struct S { int nelem; int elem[0]; }; you can allocate a new one by struct S *new_S(int nelem) { struct S *ret = malloc(sizeof(S) + nelem * sizeof(int)); ret->nelem = nelem; return ret; } Normally, we "subclass" structures by appending fields to the end: struct BASE { int x, y; }; struct DERIVED { /* from struct BASE */ int x, y; int flag; }; but this doesn't work with a dynamic-length object. So, with the caveat that you can only have dynamic-length behavior in the base class, why not place the new fields *BEFORE* the fields of base struct: struct S2 { int flag; int nelem; int elem[0]; }; now, whenever you are going to pass S2 to a function on S, you simply pass in (struct S*)((char*)s2 + offsetof(S2, nelem)) and if you're faced with an instance of S that turns out to be an S2, you can get the pointer to the start of S with (struct S2*)((char*)s - offsetof(S2, nelem)) Note that neither of these is an additional level of indirection, it's just an offset calculation, one that your compiler may be able to combine with subsequent field accesses through the -> operator. But how do you free an instance of S-or-subclass, without knowing all the subclasses? Well, you could store a pointer to the real start of the structure, or an offset back to it, in the structure. You'd use that pointer only in a few occasions, usually using the "add const to pointer" in functions which are for a particular subclass of S: struct S { void *real_head; int nelem; int elem[0]; }; struct S1 { /* derived from S */ int flag; void *real_head; int nelem; int elem[0]; }; struct S1_1 { /* derived from S1 */ int new_flag; int flag; void *real_head; int nelem; int elem[0]; }; now, you can allocate a version of an S subclass by struct S *new_S(int nelem, int pre_size) { char *mem = malloc(sizeof(S) + nelem * sizeof(int) + pre_size); struct S *ret = mem + pre_size; ret->nelem = nelem; return ret; } and free it by void free_S(struct S* s) { free(s->real_head); } I don't know how this will interact with a garbage collector, but it does maintain a pointer to the head of the allocated block, though that pointer is only accessible through a pointer to the inside of a block.

[Idea about extending variable-length structures at the front instead of at the back] The problem with applying this idea to Python objects, IMO, is that Python requires the object header to be at the start. Anything operating on a PyObject * expects that it can use the Py_INCREF and Py_DECREF macros, and those expect the refcount to be the first field and the type pointer to be the second. So our objects are already constrained at the front. Also, the GC implementation already uses thistrick: it adds three fields in front of the structure. But then it assumes you can use fixed address calculations to translate between the object and the GC header. Adding something in front of the GC header would be too painful. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Idea about extending variable-length structures at the front instead of at the back] The problem with applying this idea to Python objects, IMO, is that Python requires the object header to be at the start. Anything operating on a PyObject * expects that it can use the Py_INCREF and Py_DECREF macros, and those expect the refcount to be the first field and the type pointer to be the second. So our objects are already constrained at the front. Also, the GC implementation already uses thistrick: it adds three fields in front of the structure. But then it assumes you can use fixed address calculations to translate between the object and the GC header. Adding something in front of the GC header would be too painful. --Guido van Rossum (home page: http://www.python.org/~guido/)

It's a pity, isn't it?
A better solution is to store additional information in the __dict__.
You loose nice features: access these (new) slots from Python by providing tp_members entries for them (for example). Are you planning to address this issue in the future? Thanks, Thomas

[Guido]
A better solution is to store additional information in the __dict__.
[Thomas]
You loose nice features: access these (new) slots from Python by providing tp_members entries for them (for example).
This thread is IMO closed, just for completenes I want to mention that the same effect can be accomplished easily with tp_getset. Thomas

On Wednesday, February 6, 2002, at 09:53 PM, Thomas Heller wrote:
Martin pointed at a way to solve this. And I think that with my proposed API (... where is it..., ah yes, found it) void PyType_SetAnnotation(PyTypeObject *tp, char *name, void *unique, void *); void *PyType_GetAnnotation(PyTypeObject *tp, char *name, void *unique); it would be almost as easy to use as a tp_ slot. The only thing needed to make it 100% safe is a registry for name/descr pairs. (Actually the API is changed a little since I understand how the second arg works) For the benefit of whoever missed the previous thread: name is used as the key into the dictionary, and unique is a pointer stored with the entry, which assures that this entry hasn't been used for something else accidentally. So in stead of a new slot tp_foo what you would need to do is come up with a name ("tp_foo" comes to mind) and a global variable whose address can be used for unique. -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -

I'm not sure I understand what you mean. Why would you need a tp_members entry for something that's in __dict__?
Are you planning to address this issue in the future?
David Abrahams (of Boost++ fame) is also interested in a solution for this problem, so I may have to. Not in 2.2.1, though -- this will have to be rearchitected so it's a 2.3 issue. --Guido van Rossum (home page: http://www.python.org/~guido/)

Jack Jansen wrote:
In the discussion on my request for an ("O@", typeobject, void **) format for PyArg_Parse and Py_BuildValue MAL suggested
Thomas Heller suggested this. I am more in favour of exposing the pickle reduce API through "O@", that is have PyArgTuple_Parse() call the .__reduce__() method of the object. This will then return (factory, state_tuple) and these could then be exposed to the C function via two PyObject*. Note that there's no need for any type object magic. If this becomes a common case, it may be worthwhile to add a tp_reduce slot to type objects though. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

On Thursday, January 17, 2002, at 11:29 AM, M.-A. Lemburg wrote:
Oops, you're right. I should be careful not to mix up my Germans;-)
You've suggested this before, but at that time I ignored it because it made absolutely no sense to me. "pickle" triggers one set of ideas for me, "reduce" triggers a different set, "factory function" yet another different set. None of these sets of ideas have the least resemblance to what I'm trying to do:-) I gave a fairly complete example (using calldll from Python to wrap a function that returns a Mac WindowObject) last week, could you explain how you would implement this with pickle, reduce and factory functions? -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -

Jack Jansen wrote:
The idea is simple but extends what you are trying to achieve (I gave an example on how to use this somewhere in the "wrapper" thread). Basically, you'll just want to use the state tuple to access the underlying void* C pointer via a PyCObject which does the wrapping of the pointer. The "pickle" mechanism would store the PyCObject in the state tuple which you could then access to get at the C pointer. This may sound complicated at first, but it provides much more flexibility w/r to more complex objects, e.g. the method you have in mind only supports wrapping a single C pointer; the "pickle" mechanism can potentially handle any serializable object.
Sorry, no time for that ... I've got an important business trip next week which needs to be prepared. Please bring this up again after next week. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

On Friday, January 18, 2002, at 10:47 , M.-A. Lemburg wrote:
I think you're missing a few points here. First of all, my objects aren't PyCObjects but other extension objects. While the main pointer in the object could be wrapped in a PyCObject there may be other information in my objects that is important, such as a pointer to the dispose routine to call on the c-pointer when the Python object reaches refcount zero (and this pointer may change over time as ownership of, say, a button is passed from Python to the system). The _New and _Convert routines will know how to get from the C pointer to the *correct* object, i.e. normally there will be only one Python object for every C object. Also, the method seems rather complicated for doing a simple thing. The only thing I really want is a way to refer to an _New or _Convert method from Python code. The most reasonable way to do that seems to be by creating a way to get from te type object (which is available in Python) to those routines. Thomas' suggestion looked very promising, and simple too, until Guido said that unfortunately it couldn't be done. Your suggestion, as far as I understand it, looks complicated and probably inefficient too (remember the code will have to go through all these hoops every time it needs to convert an object from Python to C or vice versa). Correct me if I'm wrong,

Jack Jansen wrote:
I know. The idea is that either you add a .__reduce__ method to the extension objects or register their types with a registry comparable to copyreg.
Note that PyCObjects support all of this. It's not important in this context, though. The PyCObject is only used to wrap the raw pointer; the factory function then takes this pointer and creates one of your extension object out of it.
That's also possible using the "pickle" approach.
It is more complicated, but also more flexible. Plus it builds on techniques which are already applied in Python's pickle mechanism. Note that by adding a tp_reduce slot, the overhead of calling a Python function could be kept reasonable. Helper functions could aid in accessing the C pointer which is stored in the state tuple. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

From: "Martin v. Loewis" <martin@v.loewis.de> To: <jack@oratrix.com> Cc: <mal@lemburg.com>; <Jack.Jansen@oratrix.nl>; <python-dev@python.org> Sent: Friday, January 18, 2002 7:42 PM Subject: Re: [Python-Dev] Extending types in C - help needed
Hmm, not very much help ;-) Thomas

I believe the attached code implements your requirements. In particular, see PyArg_GenericCopy for an application that extracts a void* from an object through a type-safe protocol, then creates a clone of the original object through the same protocol. Both extractor and creator function are associated with the type object. To see this work in Python, run
Regards, Martin #include "Python.h" /************* Generic Converters ***************/ struct converters{ PyObject* (*create)(void*); int (*extract)(PyObject*, void**); }; char descr_string[] = "calldll converter structure"; void PyArg_AddConverters(PyTypeObject *type, struct converters* convs) { PyObject *cobj = PyCObject_FromVoidPtrAndDesc(convs, descr_string, NULL); PyDict_SetItemString(type->tp_dict, "__calldll__", cobj); Py_DECREF(cobj); } struct converters* PyArg_GetConverters(PyTypeObject *type) { PyObject *cobj; void *descr; cobj = PyObject_GetAttrString((PyObject*)type, "__calldll__"); if (!cobj) return NULL; descr = PyCObject_GetDesc(cobj); if (!descr) return NULL; if (descr != descr_string){ PyErr_SetString(PyExc_TypeError, "invalid cobj"); return NULL; } return (struct converters*)PyCObject_AsVoidPtr(cobj); } PyObject *PyArg_Create(PyTypeObject* type, void * value) { struct converters *convs = PyArg_GetConverters(type); if (!convs) return NULL; return convs->create(value); } int PyArg_Extract(PyObject* obj, void** value) { struct converters *convs = PyArg_GetConverters(obj->ob_type); if (!convs) return -1; convs->extract(obj, value); return 0; } PyObject* PyArg_GenericCopy(PyObject* obj) { void *tmp; if (PyArg_Extract(obj, &tmp)) return NULL; return PyArg_Create(obj->ob_type, tmp); } /************* End Generic Converters ***************/ typedef struct { PyObject_HEAD int handle; } HandleObject; staticforward PyTypeObject Handle_Type; #define HandleObject_Check(v) ((v)->ob_type == &Handle_Type) static HandleObject * newHandleObject(int i) { HandleObject *self; self = PyObject_New(HandleObject, &Handle_Type); if (self == NULL) return NULL; self->handle = i; return self; } /* Handle methods */ static void Handle_dealloc(HandleObject *self) { PyObject_Del(self); } /**************** Generic Converters: Handle support ***************/ static PyObject* handle_conv_new(void *s){ return (PyObject*)newHandleObject((int)s); } static int handle_conv_extract(PyObject *o, void **dest){ HandleObject *h = (HandleObject*)o; *dest = (void*)h->handle; return 0; } struct converters HandleConvs = { handle_conv_new, handle_conv_extract }; /**************** Generic Converters: Handle support ***************/ statichere PyTypeObject Handle_Type = { /* The ob_type field must be initialized in the module init function * to be portable to Windows without using C++. */ PyObject_HEAD_INIT(NULL) 0, /*ob_size*/ "handle.Handle", /*tp_name*/ sizeof(HandleObject), /*tp_basicsize*/ 0, /*tp_itemsize*/ /* methods */ (destructor)Handle_dealloc, /*tp_dealloc*/ 0, /*tp_print*/ 0, /*tp_getattr*/ 0, /*tp_setattr*/ 0, /*tp_compare*/ 0, /*tp_repr*/ 0, /*tp_as_number*/ 0, /*tp_as_sequence*/ 0, /*tp_as_mapping*/ 0, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ 0, /*tp_getattro*/ 0, /*tp_setattro*/ 0, /*tp_as_buffer*/ Py_TPFLAGS_DEFAULT, /*tp_flags*/ }; /* --------------------------------------------------------------------- */ static PyObject * xx_new(PyObject *self, PyObject *args) { HandleObject *rv; int h; if (!PyArg_ParseTuple(args, "i:new", &h)) return NULL; rv = newHandleObject(h); if ( rv == NULL ) return NULL; return (PyObject *)rv; } static PyObject * xx_copy(PyObject *self, PyObject *args) { PyObject *obj; if (!PyArg_ParseTuple(args, "O:copy", &obj)) return NULL; return PyArg_GenericCopy(obj); } static PyMethodDef xx_methods[] = { {"new", xx_new, METH_VARARGS}, {"copy", xx_copy, METH_VARARGS}, {NULL, NULL} /* sentinel */ }; DL_EXPORT(void) inithandle(void) { PyObject *m; Handle_Type.ob_type = &PyType_Type; PyType_Ready(&Handle_Type); PyArg_AddConverters(&Handle_Type, &HandleConvs); /* Create the module and add the functions */ m = Py_InitModule("handle", xx_methods); }

Indeed. I also think it is more appropriate than either a new metatype or a ParseTuple extension for the problem at hand (supporting arbitrary types in calldll), for the following reasons: - There may be different ways of how an object converts to a "native" type. In particular, in some cases, ParseTuple may need to return (fill out) something more complex than a void*, something that calldll cannot support by nature. - A type may need to provide various independent extensions to the standard protocols, e.g. it may provide "give me a Unicode doc string" in addition to "give me a conversion function to void*". In this case, you'd need multiple inheritance on the metatype level, something that does not reflect well in C. For Python, it is much more common not to care at all about inheritance. Instead, just access the protocol, and expect an exception if it is not supported. Also notice that this *does* make use of new-style classes: In 2.1, types did not have a tp_dict slot. Of course, the PyType_Ready call should go immediately before the place where tp_dict is accessed, and a check should be added whether tp_flags contains Py_TPFLAGS_HAVE_CLASS. Regards, Martin

Wouldn't it suffice to check for tp_dict != NULL (after the call to PyType_Ready of course)?
No, see below (although I must admit that I wrote "Right" here first :-)
Hm. What does Py_TPFLAGS_HAVE_CLASS mean exactly?
According to the documentation, it means that the underlying TypeObject structure has the necessary fields in its C declaration.
Or, better, since TPFLAGS_DEFAULT contains TPFLAGS_HAVE_CLASS, what does it mean when Py_TPFLAGS_HAVE_CLASS is NOT in tp_flags?
It means you have been loading a module from an earlier Python version, which had a different setting for TPFLAGS_DEFAULTS, and a shorter definition of the TypeObject. If you try to access tp_dict in such an object, you are accessing random memory. This may immediately crash, or only crash when you pass the pointer you got to the dictionary functions. Regards, Martin

On Friday, January 18, 2002, at 08:36 PM, Martin v. Loewis wrote: >> Also, the method seems rather complicated for doing a simple >> thing. The >> only thing I really want is a way to refer to an _New or >> _Convert method >> from Python code. > > I believe the attached code implements your requirements. > Martin, hats off! This does exactly what I want, and it does so in a pretty generalized way. Actually in _such_ a generalized way that I think this should be documented loud and clear. Looking at it a bit more, how about storing each function pointer in a separate PyCObject, and adding general APIs somewhere in the core void PyType_SetAnnotation(PyTypeObject *tp, char *name, char *descr, void *); void *PyType_GetAnnotation(PyTypeObject *tp, char *name, char *descr); (I've picked the name annotation here, because it sort-of feels like that, another name may bring the idea across better). > -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -

Thanks!
I'll happily add that to some recipe collection. However, before generalizing it, I'd like to see more use cases. There should, atleast, be a *second* application beyond calldll (or, perhaps even beyond MacPython). Generalizing from a single use case is not good. Regards, Martin

[Martin's PyCObject based Handle object] This seems to be very close to the __reduce__ idea I posted on this thread a couple of days ago. Why not extend it to fully support this standard Python protocol ? -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Because it is not clear, to me, what specifically the semantics of this protocol is. I wrote it to support MacOS calldll. I cannot see applicability beyond this API. One of the strength of OO and polymorphism is precisely that users can freely extend the protocols that their objects support, without requiring *all* objects to support the protocol. A standard protocol should be clearly useful cross-platform, for many different types, in different applications. Regards, Martin

From: "Jack Jansen" <Jack.Jansen@oratrix.nl>
Currently (after quite some time) I have the impression that you cannot create a subtype of PyType_Type in C because PyType_Type ends in a variable sized array, at least not in this way: struct { PyTypeObject type; ...additional fields... } WrapperType_Type; Can someone confirm this? (I have to find out what to do with the tp_members slot, which seems to be correspond to the Python level __slots__ class variable) Thomas

Yes, alas. The type you would have to declare is 'etype', a private type in typeobject.c. --Guido van Rossum (home page: http://www.python.org/~guido/)

I wish I had time to explain this, but I don't. For now, you'll have to read how types are initialized in typeobject.c -- maybe there's a way, maybe there isn't.
Any tips about the route to take?
It can be done easily dynamically. --Guido van Rossum (home page: http://www.python.org/~guido/)

From: "Guido van Rossum" <guido@python.org>
I'm still struggling with this. How can it be done dynamically? My idea would be to realloc() the object after creation, adding a few bytes at the end. The problem is that I don't know how to find out about the object size without knowledge about the internals. The formula given in PEP 253 type->tp_basicsize + nitems * type->tp_itemsize seems not to be valid any more (at least with CYCLE GC). Thomas

I have thought about this a little more and come to the conclusion that you cannot define a metaclass that creates type objects that have more C slots than the standard type object lay-out. It would be the same as trying to add a C slot to the instances of a string subtype: there's variable-length data at the end, and you cannot place anything *before* that variable-length data because all the C code that works with the base type knows where the variable length data start; you cannot place anything *after* that variable-lenth data because there's no way to address it from C. The only way around this would be to duplicate *all* the code of type_new(), which I don't recommend because it will probably have to be changed for each Python version (even for bugfix releases). A better solution is to store additional information in the __dict__. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wed, Feb 06, 2002 at 09:36:27AM -0500, Guido van Rossum wrote:
I had a half-baked idea when I read this. Is there something unworkable about the scheme, aside from being very different from the way Python currently operates? Has anybody written a system that works this way? Is it just plain gross? Jeff Epler jepler@inetnebr.com Half-Baked Idea --------------- The problem is that we have variable-length types. For example, struct S { int nelem; int elem[0]; }; you can allocate a new one by struct S *new_S(int nelem) { struct S *ret = malloc(sizeof(S) + nelem * sizeof(int)); ret->nelem = nelem; return ret; } Normally, we "subclass" structures by appending fields to the end: struct BASE { int x, y; }; struct DERIVED { /* from struct BASE */ int x, y; int flag; }; but this doesn't work with a dynamic-length object. So, with the caveat that you can only have dynamic-length behavior in the base class, why not place the new fields *BEFORE* the fields of base struct: struct S2 { int flag; int nelem; int elem[0]; }; now, whenever you are going to pass S2 to a function on S, you simply pass in (struct S*)((char*)s2 + offsetof(S2, nelem)) and if you're faced with an instance of S that turns out to be an S2, you can get the pointer to the start of S with (struct S2*)((char*)s - offsetof(S2, nelem)) Note that neither of these is an additional level of indirection, it's just an offset calculation, one that your compiler may be able to combine with subsequent field accesses through the -> operator. But how do you free an instance of S-or-subclass, without knowing all the subclasses? Well, you could store a pointer to the real start of the structure, or an offset back to it, in the structure. You'd use that pointer only in a few occasions, usually using the "add const to pointer" in functions which are for a particular subclass of S: struct S { void *real_head; int nelem; int elem[0]; }; struct S1 { /* derived from S */ int flag; void *real_head; int nelem; int elem[0]; }; struct S1_1 { /* derived from S1 */ int new_flag; int flag; void *real_head; int nelem; int elem[0]; }; now, you can allocate a version of an S subclass by struct S *new_S(int nelem, int pre_size) { char *mem = malloc(sizeof(S) + nelem * sizeof(int) + pre_size); struct S *ret = mem + pre_size; ret->nelem = nelem; return ret; } and free it by void free_S(struct S* s) { free(s->real_head); } I don't know how this will interact with a garbage collector, but it does maintain a pointer to the head of the allocated block, though that pointer is only accessible through a pointer to the inside of a block.

[Idea about extending variable-length structures at the front instead of at the back] The problem with applying this idea to Python objects, IMO, is that Python requires the object header to be at the start. Anything operating on a PyObject * expects that it can use the Py_INCREF and Py_DECREF macros, and those expect the refcount to be the first field and the type pointer to be the second. So our objects are already constrained at the front. Also, the GC implementation already uses thistrick: it adds three fields in front of the structure. But then it assumes you can use fixed address calculations to translate between the object and the GC header. Adding something in front of the GC header would be too painful. --Guido van Rossum (home page: http://www.python.org/~guido/)

[Idea about extending variable-length structures at the front instead of at the back] The problem with applying this idea to Python objects, IMO, is that Python requires the object header to be at the start. Anything operating on a PyObject * expects that it can use the Py_INCREF and Py_DECREF macros, and those expect the refcount to be the first field and the type pointer to be the second. So our objects are already constrained at the front. Also, the GC implementation already uses thistrick: it adds three fields in front of the structure. But then it assumes you can use fixed address calculations to translate between the object and the GC header. Adding something in front of the GC header would be too painful. --Guido van Rossum (home page: http://www.python.org/~guido/)

It's a pity, isn't it?
A better solution is to store additional information in the __dict__.
You loose nice features: access these (new) slots from Python by providing tp_members entries for them (for example). Are you planning to address this issue in the future? Thanks, Thomas

[Guido]
A better solution is to store additional information in the __dict__.
[Thomas]
You loose nice features: access these (new) slots from Python by providing tp_members entries for them (for example).
This thread is IMO closed, just for completenes I want to mention that the same effect can be accomplished easily with tp_getset. Thomas

On Wednesday, February 6, 2002, at 09:53 PM, Thomas Heller wrote:
Martin pointed at a way to solve this. And I think that with my proposed API (... where is it..., ah yes, found it) void PyType_SetAnnotation(PyTypeObject *tp, char *name, void *unique, void *); void *PyType_GetAnnotation(PyTypeObject *tp, char *name, void *unique); it would be almost as easy to use as a tp_ slot. The only thing needed to make it 100% safe is a registry for name/descr pairs. (Actually the API is changed a little since I understand how the second arg works) For the benefit of whoever missed the previous thread: name is used as the key into the dictionary, and unique is a pointer stored with the entry, which assures that this entry hasn't been used for something else accidentally. So in stead of a new slot tp_foo what you would need to do is come up with a name ("tp_foo" comes to mind) and a global variable whose address can be used for unique. -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -

I'm not sure I understand what you mean. Why would you need a tp_members entry for something that's in __dict__?
Are you planning to address this issue in the future?
David Abrahams (of Boost++ fame) is also interested in a solution for this problem, so I may have to. Not in 2.2.1, though -- this will have to be rearchitected so it's a 2.3 issue. --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (7)
-
Guido van Rossum
-
Jack Jansen
-
Jack Jansen
-
Jeff Epler
-
M.-A. Lemburg
-
Martin v. Loewis
-
Thomas Heller