Call PyType_Ready on builtin types during interpreter startup?
Some strangeness was recently reported for the range() type in Py3k where instances are unhashable until an attribute is retrieved from the range type itself, and then they become hashable. [1] While there is definitely an associated bug in the range implementation (it doesn't block inheritance of the default object.__hash__ implementation), there's also the fact that when the interpreter *starts* the hash implementation hasn't been inherited yet, but it does get inherited later. It turns out that _PyBuiltin_Init doesn't call PyType_Ready on any of the builtin types - they're left to have it called implicitly when an operation using them needs tp_dict filled in. Such operations (which includes retrieving an attribute from the type object) will implicitly call PyType_Ready to populate tp_dict, which also has the side effect of inheriting slot implementations from base classes. Is there a specific reason for not fully initialising the builtin types? Or should we be calling PyType_Ready on each of them from _PyBuiltin_Init? Cheers, Nick. [1] http://bugs.python.org/issue4701 -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan wrote:
Is there a specific reason for not fully initialising the builtin types? Or should we be calling PyType_Ready on each of them from _PyBuiltin_Init?
I need to correct this slightly: some builtin types *are* initialised properly by _Py_ReadyTypes. So the question is actually whether or not the missing builtin types should be added to that function. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan wrote:
Nick Coghlan wrote:
Is there a specific reason for not fully initialising the builtin types? Or should we be calling PyType_Ready on each of them from _PyBuiltin_Init?
I need to correct this slightly: some builtin types *are* initialised properly by _Py_ReadyTypes.
So the question is actually whether or not the missing builtin types should be added to that function.
I'm probably going to fix the specific problem with hashing of range objects in Py3k just by initialising xrange/range properly in _Py_ReadyTypes. However, I wonder how many other builtin types have the same problem - for example, the enumerate type is also missing a call to PyType_Ready: Python 3.1a0 (py3k, Dec 14 2008, 21:35:11) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
x = enumerate([]) hash(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'enumerate' enumerate.__name__ # implicit call to PyType_Ready 'enumerate' hash(x) -1212398692
Rather than playing whack-a-mole with this, does anyone have any ideas on how to systematically find types which are defined in the core, but are missing an explicit PyType_Ready call? (I guess one way would be to remove all the implicit calls in a local build and see what blows up... that seems a little drastic though) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan wrote:
Rather than playing whack-a-mole with this, does anyone have any ideas on how to systematically find types which are defined in the core, but are missing an explicit PyType_Ready call? (I guess one way would be to remove all the implicit calls in a local build and see what blows up... that seems a little drastic though)
The whack-a-mole tactic did pick up a couple more though - the two "builtin" types that iter() can return (the basic sequence iterator and the callable with sentinel result iterator). Perhaps the path of least resistance is to change PyObject_Hash to be yet another place where PyType_Ready will be called implicitly if it hasn't been called already? That approach would get us back to the Python 2.x status quo where calling PyType_Ready was only absolutely essential if you wanted to correctly inherit a slot from a type other than object itself. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan wrote:
Nick Coghlan wrote:
Rather than playing whack-a-mole with this, does anyone have any ideas on how to systematically find types which are defined in the core, but are missing an explicit PyType_Ready call? (I guess one way would be to remove all the implicit calls in a local build and see what blows up... that seems a little drastic though)
The whack-a-mole tactic did pick up a couple more though - the two "builtin" types that iter() can return (the basic sequence iterator and the callable with sentinel result iterator).
Perhaps the path of least resistance is to change PyObject_Hash to be yet another place where PyType_Ready will be called implicitly if it hasn't been called already?
I think that's the best thing to do. It would bring PyObject_Hash in line with PyObject_Format, for example. Eric.
On Sun, Dec 28, 2008 at 5:09 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Nick Coghlan wrote:
Nick Coghlan wrote:
Is there a specific reason for not fully initialising the builtin types? Or should we be calling PyType_Ready on each of them from _PyBuiltin_Init?
I need to correct this slightly: some builtin types *are* initialised properly by _Py_ReadyTypes.
So the question is actually whether or not the missing builtin types should be added to that function.
I'm probably going to fix the specific problem with hashing of range objects in Py3k just by initialising xrange/range properly in _Py_ReadyTypes.
However, I wonder how many other builtin types have the same problem - for example, the enumerate type is also missing a call to PyType_Ready:
Python 3.1a0 (py3k, Dec 14 2008, 21:35:11) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
x = enumerate([]) hash(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unhashable type: 'enumerate' enumerate.__name__ # implicit call to PyType_Ready 'enumerate' hash(x) -1212398692
Rather than playing whack-a-mole with this, does anyone have any ideas on how to systematically find types which are defined in the core, but are missing an explicit PyType_Ready call? (I guess one way would be to remove all the implicit calls in a local build and see what blows up... that seems a little drastic though)
What I did with safethread was replace the implicit calls with assertions. That with the test suite should pick everything up. -- Adam Olsen, aka Rhamphoryncus
On Sat, Dec 20, 2008, Nick Coghlan wrote:
It turns out that _PyBuiltin_Init doesn't call PyType_Ready on any of the builtin types - they're left to have it called implicitly when an operation using them needs tp_dict filled in.
This seems like a release blocker for 3.0.1 to me -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan
Aahz wrote:
On Sat, Dec 20, 2008, Nick Coghlan wrote:
It turns out that _PyBuiltin_Init doesn't call PyType_Ready on any of the builtin types - they're left to have it called implicitly when an operation using them needs tp_dict filled in.
This seems like a release blocker for 3.0.1 to me
The problem isn't actually as bad as I first thought (it turns out most of the builtin types *are* fully initialised in _Py_ReadyTypes, which is called from Py_InitializeEx). However, xrange/range are definitely missing from that function (which is the actual proximate cause of the strange range() hashing behaviour in Py3k), and I'm still hoping someone knows why the numeric types aren't being readied there when certain parts of the core need additional handling to cope with the possibility that those types aren't fully initialised (e.g. PyObject_Format has a lazy call to PyType_Ready with a comment noting that it may be asked to format floating point numbers before PyType_Ready has otherwise been called for the float type). That said, I have still added the range() hashing problem to the list of release blockers. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
participants (4)
-
Aahz
-
Adam Olsen
-
Eric Smith
-
Nick Coghlan