refcounting vs PyModule_AddObject

I've just fixed a bug where Py_INCREF wasn't called when it should have been before a call to PyModule_AddObject (rev. 2.62 of Modules/threadmodule.c). So I went looking for other instances of the same problem. I didn't find any (though I don't understand how _csv.c gets away with line 1579), but what I *did* find were absolutely masses of what seemed like unnecessary increfs. Consider this code from _hotshot.c: Py_INCREF(&LogReaderType); PyModule_AddObject(module, "LogReaderType", (PyObject *)&LogReaderType); Py_INCREF(&ProfilerType); PyModule_AddObject(module, "ProfilerType", (PyObject *)&ProfilerType); if (ProfilerError == NULL) ProfilerError = PyErr_NewException("hotshot.ProfilerError", NULL, NULL); if (ProfilerError != NULL) { Py_INCREF(ProfilerError); PyModule_AddObject(module, "ProfilerError", ProfilerError); } The first two calls are fine; it was an incref like this that was missing in threadmodule.c. The second seems like a reference "leak": PyErr_NewException returns a new reference, then the Py_INCREF/PyModule_AddObject pair is refcount neutral, so control falls off the end of the function owning a reference to ProfilerError. I think the Py_INCREF should just be removed, but I'm wondering if I'm missing something... Cheers, mwh -- MacOSX: Sort of like a pedigree persian cat. Very sleek, very sexy, but a little too prone to going cross-eyed, biting you on your thumb and then throwing up on your trousers. -- Jim's pedigree of operating systems, asr

Michael> So I went looking for other instances of the same problem. I Michael> didn't find any (though I don't understand how _csv.c gets away Michael> with line 1579)... Same reason the Py_INCREF of ProfileError isn't necessary I think. PyDict_New() returns a new reference which is passed onto PyModule_AddObject(). No Py_DECREF of the dialects dict occurs, so it exits reference-neutral w.r.t. the dialects object. Skip

Skip Montanaro <skip@pobox.com> writes:
Michael> So I went looking for other instances of the same problem. I Michael> didn't find any (though I don't understand how _csv.c gets away Michael> with line 1579)...
Same reason the Py_INCREF of ProfileError isn't necessary I think. PyDict_New() returns a new reference which is passed onto PyModule_AddObject(). No Py_DECREF of the dialects dict occurs, so it exits reference-neutral w.r.t. the dialects object.
Oops; I meant line 1590. Cheers, mwh -- Windows 2000: Smaller cow. Just as much crap. -- Jim's pedigree of operating systems, asr

Michael> ... (though I don't understand how _csv.c gets away Michael> with line 1579)... Michael> Oops; I meant line 1590. Hmmm... Me either. Is it possible it was just never DECREF'd? I checked in the obvious fix on both head and the 2.4 release branch. Skip

Hi Michael, On Wed, Jun 15, 2005 at 01:35:35PM +0100, Michael Hudson wrote:
if (ProfilerError == NULL) ProfilerError = PyErr_NewException("hotshot.ProfilerError", NULL, NULL); if (ProfilerError != NULL) { Py_INCREF(ProfilerError); PyModule_AddObject(module, "ProfilerError", ProfilerError); }
I think the Py_INCREF is needed here. The ProfilerError is a global variable that needs the extra reference. Otherwise, a malicious user could do "del _hotshot.ProfilerError" and have it garbage-collected under the feet of _hotshot.c which still uses it. What I don't get is how ProfilerError could fail to be NULL in the first 'if' above, but that's a different matter. While we're at strange refcounting problems, PyModule_AddObject() only decrefs its last argument if no error occurs. This is probably wrong. In general I've found that the C modules' init code is fragile. This might be due to the idea that it runs only once anyway, and global C-module objects are immortal anyway, so sloppiness sneaks in. But for example, the following is common: m = Py_InitModule3("xxx", NULL, module_doc); Py_INCREF(&Xxx_Type); PyModule_AddObject(m, "Xxx", (PyObject *)&Xxx_Type); This generates a segfault if Py_InitModule3() returns NULL (however rare that situation is). A bientot, Armin

Armin Rigo <arigo@tunes.org> writes:
Hi Michael,
On Wed, Jun 15, 2005 at 01:35:35PM +0100, Michael Hudson wrote:
if (ProfilerError == NULL) ProfilerError = PyErr_NewException("hotshot.ProfilerError", NULL, NULL); if (ProfilerError != NULL) { Py_INCREF(ProfilerError); PyModule_AddObject(module, "ProfilerError", ProfilerError); }
I think the Py_INCREF is needed here. The ProfilerError is a global variable that needs the extra reference. Otherwise, a malicious user could do "del _hotshot.ProfilerError" and have it garbage-collected under the feet of _hotshot.c which still uses it.
Hmm. Point. But then how doesn't this apply to things like 'del thread._local'? (after my recent fix)
What I don't get is how ProfilerError could fail to be NULL in the first 'if' above, but that's a different matter.
Well, could it fail to be NULL in the multiple interpreters case? Though I'm not at all sure that reusing is wise in that case...
While we're at strange refcounting problems, PyModule_AddObject() only decrefs its last argument if no error occurs. This is probably wrong.
Quite probably :-/
In general I've found that the C modules' init code is fragile. This might be due to the idea that it runs only once anyway, and global C-module objects are immortal anyway, so sloppiness sneaks in.
Oh yes. Cheers, mwh -- It's actually a corruption of "starling". They used to be carried. Since they weighed a full pound (hence the name), they had to be carried by two starlings in tandem, with a line between them. -- Alan J Rosenthal explains "Pounds Sterling" on asr

Michael Hudson wrote:
if (ProfilerError == NULL) ProfilerError = PyErr_NewException("hotshot.ProfilerError", NULL, NULL); if (ProfilerError != NULL) { Py_INCREF(ProfilerError); PyModule_AddObject(module, "ProfilerError", ProfilerError); }
I think the Py_INCREF should just be removed, but I'm wondering if I'm missing something...
It may be me who is missing something, but... On reference is added to the dictionary, this is the one the explicit INCREF creates. The other reference is held in the C variable ProfilerError; this is the one that creating the exception object creates. It is convention that C variables which are explicitly used also hold their own references, even if they are global, and even if there is no procedure to clear them. The reason is that they would become stale if the module object went away. As there is no way to protect against this case, they just keep a garbage reference. Regards, Martin

"Martin v. Löwis" <martin@v.loewis.de> writes:
Michael Hudson wrote:
if (ProfilerError == NULL) ProfilerError = PyErr_NewException("hotshot.ProfilerError", NULL, NULL); if (ProfilerError != NULL) { Py_INCREF(ProfilerError); PyModule_AddObject(module, "ProfilerError", ProfilerError); }
I think the Py_INCREF should just be removed, but I'm wondering if I'm missing something...
It may be me who is missing something, but...
Well, quite possibly not.
On reference is added to the dictionary, this is the one the explicit INCREF creates. The other reference is held in the C variable ProfilerError; this is the one that creating the exception object creates.
It is convention that C variables which are explicitly used also hold their own references, even if they are global, and even if there is no procedure to clear them. The reason is that they would become stale if the module object went away. As there is no way to protect against this case, they just keep a garbage reference.
This means two things, as I see it: 1) Py_Initialize()/Py_Finalize() loops are going to leak quite a lot. Maybe we don't care about this. 2) In the case of the init_hotshot code above and such a loop, the ProfilerError object from the first interpreter will be reused by the second, which doesn't seem like a good idea (won't it be inheriting from the wrong PyExc_Exception?). Currently running Demo/embed/loop 'import gc' crashes for a similar kind of reason -- the gc.garbage object is shared between interpreters, but the only reference to it is in the module's __dict__ (well, if the module exists...). I've been looking at this area partly to try and understand this bug: [ 1163563 ] Sub threads execute in restricted mode but I'm not sure the whole idea of multiple interpreters isn't inherently doomed :-/ Cheers, mwh -- Premature optimization is the root of all evil. -- Donald E. Knuth, Structured Programming with goto Statements

Michael Hudson wrote:
I've been looking at this area partly to try and understand this bug:
[ 1163563 ] Sub threads execute in restricted mode
but I'm not sure the whole idea of multiple interpreters isn't inherently doomed :-/
That's what Tim asserts, saying that people who want to use the feature should fix it themselves. Regards, Martin

"Martin v. Löwis" <martin@v.loewis.de> writes:
Michael Hudson wrote:
I've been looking at this area partly to try and understand this bug:
[ 1163563 ] Sub threads execute in restricted mode
but I'm not sure the whole idea of multiple interpreters isn't inherently doomed :-/
That's what Tim asserts, saying that people who want to use the feature should fix it themselves.
Well, they've tried, and I think I've worked out a proper fix (life would be easier if people didn't check in borken code :). Cheers, mwh -- Premature optimization is the root of all evil. -- Donald E. Knuth, Structured Programming with goto Statements

[Michael Hudson]
I've been looking at this area partly to try and understand this bug:
[ 1163563 ] Sub threads execute in restricted mode
but I'm not sure the whole idea of multiple interpreters isn't inherently doomed :-/
[Martin v. Löwis]
That's what Tim asserts, saying that people who want to use the feature should fix it themselves.
I haven't said they're doomed, although I have said that people who want to use them are on their own. I think the latter is simply an obvious truth, since (a) multiple interpreters have been entirely unexercised by the Python test suite ("if it's not tested, it's broken" rules); (b) Python developers generally pay no attention to them; and (c), as the teensy bit of docs for them imply, they're an "80% solution", but to a problem that's probably more sensitive than most to glitches in the other 20%. I've also said that Mark's thread state PEP explicitly disavowed responsibility for working nicely with multiple interpreters. I said that only because that's what his PEP said <wink>.
participants (5)
-
"Martin v. Löwis"
-
Armin Rigo
-
Michael Hudson
-
Skip Montanaro
-
Tim Peters