Half-baked idea (was Re: [Python-Dev] Extending types in C - help needed)

Wed, 6 Feb 2002 10:24:31 -0600

On Wed, Feb 06, 2002 at 09:36:27AM -0500, Guido van Rossum wrote:
> I have thought about this a little more and come to the conclusion
> that you cannot define a metaclass that creates type objects that have
> more C slots than the standard type object lay-out.  It would be the
> same as trying to add a C slot to the instances of a string subtype:
> there's variable-length data at the end, and you cannot place anything
> *before* that variable-length data because all the C code that works
> with the base type knows where the variable length data start; you
> cannot place anything *after* that variable-lenth data because there's
> no way to address it from C.

I had a half-baked idea when I read this.  Is there something unworkable
about the scheme, aside from being very different from the way Python
currently operates?  Has anybody written a system that works this way?
Is it just plain gross?

Jeff Epler
jepler@inetnebr.com

Half-Baked Idea
---------------

The problem is that we have variable-length types.  For example,

    struct S {
	int nelem;
	int elem[0];
    };

you can allocate a new one by
    struct S *new_S(int nelem) {
	struct S *ret = malloc(sizeof(S) + nelem * sizeof(int));
	ret->nelem = nelem;
	return ret;
    }

Normally, we "subclass" structures by appending fields to the end:
    struct BASE {
	int x, y;
    };

    struct DERIVED { /* from struct BASE */
	int x, y;
	int flag;
    };

but this doesn't work with a dynamic-length object.

So, with the caveat that you can only have dynamic-length behavior in the base
class, why not place the new fields *BEFORE* the fields of base struct:

    struct S2 {
	int flag;
	int nelem;
	int elem[0];
    };

now, whenever you are going to pass S2 to a function on S, you simply pass in
    (struct S*)((char*)s2 + offsetof(S2, nelem))
and if you're faced with an instance of S that turns out to be an S2, you
can get the pointer to the start of S with 
    (struct S2*)((char*)s - offsetof(S2, nelem))
Note that neither of these is an additional level of indirection, it's just
an offset calculation, one that your compiler may be able to combine with
subsequent field accesses through the -> operator.

But how do you free an instance of S-or-subclass, without knowing all the
subclasses?  Well, you could store a pointer to the real start of the
structure, or an offset back to it, in the structure.  You'd use that
pointer only in a few occasions, usually using the "add const to pointer"
in functions which are for a particular subclass of S:

    struct S {
	void *real_head;
	int nelem;
	int elem[0];
    };

    struct S1 { /* derived from S */
	int flag;
	void *real_head;
	int nelem;
	int elem[0];
    };

    struct S1_1 { /* derived from S1 */
	int new_flag;
	int flag;
	void *real_head;
	int nelem;
	int elem[0];
    };

now, you can allocate a version of an S subclass by
    struct S *new_S(int nelem, int pre_size) {
	char *mem = malloc(sizeof(S) + nelem * sizeof(int) + pre_size);
	struct S *ret = mem + pre_size;
	ret->nelem = nelem;
	return ret;
    }
and free it by
    void free_S(struct S* s) {
	free(s->real_head);
    }

I don't know how this will interact with a garbage collector, but it
does maintain a pointer to the head of the allocated block, though that
pointer is only accessible through a pointer to the inside of a block.