Py_TPFLAGS_HEAPTYPE too overloaded
I'm writing a C Python extension that needs to generate PyTypeObjects dynamically. Unfortunately, the Py_TPFLAGS_HEAPTYPE flag is overloaded in a way that makes it difficult to achieve this goal. The documentation for Pt_TPFLAGS_HEAPTYPE says: Py_TPFLAGS_HEAPTYPE This bit is set when the type object itself is allocated on the heap. In this case, the ob_type field of its instances is considered a reference to the type, and the type object is INCREF’ed when a new instance is created, and DECREF’ed when an instance is destroyed (this does not apply to instances of subtypes; only the type referenced by the instance’s ob_type gets INCREF’ed or DECREF’ed). This sounds like exactly what I want. I want my type object INCREF'd and DECREF'd by its instances so it doesn't leak or get deleted prematurely. If this were all that Py_TPFLAGS_HEAPTYPE did, it would work great for me. Unfortunately, Py_TPFLAGS_HEAPTYPE is also overloaded to mean "user-defined type" (as opposed to a built-in type). It controls numerous subtle behaviors such as: - whether the type's name is module.type or just type. - whether you're allowed to set __name__, __module__, or __bases__ on the type. - whether you're allowed to set __class__ on instances of this type. - whether the module name comes from the type name or the __module__ attribute. - whether it will use type->tp_doc as the docstring - whether its repr() calls it a "class" or a "type". - whether you can set attributes of the type. - whether someone is attempting the Carlo Verre hack. So I'm stuck with an unenviable choice. I think the lesser of two evils is to *not* specify Py_TPFLAGS_HEAPTYPE, because the worst that will happen is that my types will leak. This is not as bad as having someone set __class__ on one of my instances, or set attributes on my type, etc. Ideally the interpreter would have a separate flag like Py_TPFLAGS_BUILTIN that would trigger all of the above behaviors, but still make it possible to have dynamically generated built-in types get garbage collected appropriately. At the very least, the documentation I cited above should make it clear that Py_TPFLAGS_HEAPTYPE controls more than just whether the type gets INCREF'd and DECREF'd. Based on the list of behaviors I discovered above, it is almost certainly not correct for a C exension type to be declared with Py_TPFLAGS_HEAPTYPE. Josh
Joshua Haberman wrote:
This is not as bad as having someone set __class__ on one of my instances, or set attributes on my type, etc.
Is there any real need to prevent someone from doing those things? Note that even when you are allowed to change the __class__ of an instance, you're still prevented from changing it to something that has a different C layout, so you can't crash the intepreter that way. Similarly, built-in methods check that they're given an object of appropriate type at the C level. My suggestion is to just let it be a full heap type and accept whatever consequences follow. -- Greg
Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
Joshua Haberman wrote:
This is not as bad as having someone set __class__ on one of my instances, or set attributes on my type, etc.
Is there any real need to prevent someone from doing those things?
My ultimate goal is to make my types as much like regular built-in types as possible. Python as a language has chosen to "lock down" built-in objects, even going so far as to specifically check for the "Carlo Verre hack." I defer to those decisions to answer the question "is there any real need to prevent someone from doing these things?" If it's important for the built-in types, why should it be less important for mine? I don't want my type to be a second-class citizen just because I happen to be dynamically allocating it. If I were writing this extension for a language like Ruby, for which it is convention that built-in classes are "open," then I wouldn't mind allowing these things. I'm just trying to make my extension as idiomatic and "native" as possible. Josh
Joshua Haberman wrote:
Python as a language has chosen to "lock down" built-in objects... If it's important for the built-in types, why should it be less important for mine?
I'm not really sure why so much trouble is taken to lock down builtin types -- it seems to go against Python's general consenting-adults philosophy. I suppose it's felt that you should be able to rely on builtin types not changing their behaviour. This is probably more important for the core types than those in extension modules. Many of the standard library classes are written in Python, so this protection doesn't extend to them.
I don't want my type to be a second-class citizen just because I happen to be dynamically allocating it.
I don't think anyone will regard your types as second-class because they allow you to do *more* with them. The only real concern would be if it were somehow possible to crash the interpreter by modifying the type dict. I don't see how that could happen -- but maybe someone else on python-dev knows more about this? -- Greg
Greg Ewing wrote:
The only real concern would be if it were somehow possible to crash the interpreter by modifying the type dict. I don't see how that could happen -- but maybe someone else on python-dev knows more about this?
I believe a major part of the issue is that attempting to answer the question "Will allowing mutating operation *X* on object/type/float/int/etc create any interpreter stability or security problems?" is awfully close to trying to prove a negative. Certainly, at least some of the "generic" operations involving types include additional sanity checks that are bypassed for the builtin types. One specific example I can think of is that object.__hash__ is special cased in a few places due to the way its definition interacts with the definition of comparison operations. Allowing changes to the contents of object's tp_hash slot could lead to much weirdness when it came to __hash__ inheritance. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan wrote:
One specific example I can think of is that object.__hash__ is special cased in a few places due to the way its definition interacts with the definition of comparison operations. Allowing changes to the contents of object's tp_hash slot could lead to much weirdness when it came to __hash__ inheritance.
Just thought of a much better example as I clicked send: the basic numeric types (especially int) are locked down because they are special-cased all over the place (including in the main interpreter eval loop) in order to speed up simple arithmetic. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Sun, Jul 26, 2009 at 11:01 AM, Joshua Haberman<joshua@reverberate.org> wrote:
I'm writing a C Python extension that needs to generate PyTypeObjects dynamically. Unfortunately, the Py_TPFLAGS_HEAPTYPE flag is overloaded in a way that makes it difficult to achieve this goal.
The documentation for Pt_TPFLAGS_HEAPTYPE says:
Py_TPFLAGS_HEAPTYPE
This bit is set when the type object itself is allocated on the heap. In this case, the ob_type field of its instances is considered a reference to the type, and the type object is INCREF’ed when a new instance is created, and DECREF’ed when an instance is destroyed (this does not apply to instances of subtypes; only the type referenced by the instance’s ob_type gets INCREF’ed or DECREF’ed).
This sounds like exactly what I want. I want my type object INCREF'd and DECREF'd by its instances so it doesn't leak or get deleted prematurely. If this were all that Py_TPFLAGS_HEAPTYPE did, it would work great for me.
Unfortunately, Py_TPFLAGS_HEAPTYPE is also overloaded to mean "user-defined type" (as opposed to a built-in type). It controls numerous subtle behaviors such as:
- whether the type's name is module.type or just type. - whether you're allowed to set __name__, __module__, or __bases__ on the type. - whether you're allowed to set __class__ on instances of this type. - whether the module name comes from the type name or the __module__ attribute. - whether it will use type->tp_doc as the docstring - whether its repr() calls it a "class" or a "type". - whether you can set attributes of the type. - whether someone is attempting the Carlo Verre hack.
So I'm stuck with an unenviable choice. I think the lesser of two evils is to *not* specify Py_TPFLAGS_HEAPTYPE, because the worst that will happen is that my types will leak. This is not as bad as having someone set __class__ on one of my instances, or set attributes on my type, etc.
Ideally the interpreter would have a separate flag like Py_TPFLAGS_BUILTIN that would trigger all of the above behaviors, but still make it possible to have dynamically generated built-in types get garbage collected appropriately.
At the very least, the documentation I cited above should make it clear that Py_TPFLAGS_HEAPTYPE controls more than just whether the type gets INCREF'd and DECREF'd. Based on the list of behaviors I discovered above, it is almost certainly not correct for a C exension type to be declared with Py_TPFLAGS_HEAPTYPE.
Josh
Hi Joshua, recently I also needed to dynamically make subtypes from C, I tried 2 ways of doing this, one is to do the C equivalent of calling type("name",(bases,...), dict) and the other is to malloc() PyTypeObject's, fill in the slots and run PyType_Ready on them to initialize them. It seems the first is the expected way to make your own types so I assume thats what your doing?, Just wondering because if you do it the second way I think youll have more control and the types will be more limited (like internal types). I'm not expert enough in this area to know if malloc'ing PyTypeObject and initializing has some other problems. - Campbell
Campbell Barton wrote:
I'm not expert enough in this area to know if malloc'ing PyTypeObject and initializing has some other problems.
The only problem is that such types will be expected to be around forever - they are not reference-counted like heap types, so there is no mechanism to free them once they are no longer needed.
participants (5)
-
Campbell Barton
-
Greg Ewing
-
Hrvoje Niksic
-
Joshua Haberman
-
Nick Coghlan