[Python-Dev] Re: [Csv] csv module TODO list

Tim Peters tim.peters at gmail.com
Fri Jan 7 17:00:42 CET 2005


[Andrew McNamara]
>> Also, review comments from Jeremy Hylton, 10 Apr 2003:
>>
>>    I've been reviewing extension modules looking for C types that should
>>    participate in garbage collection.  I think the csv ReaderObj and
>>    WriterObj should participate.  The ReaderObj it contains a reference to
>>    input_iter that could be an arbitrary Python object.  The iterator
>>    object could well participate in a cycle that refers to the ReaderObj.
>>    The WriterObj has a reference to a writeline callable, which could well
>>    be a method of an object that also points to the WriterObj.

> I finally got around to looking at this, only to realise Jeremy did the
> work back in Apr 2003 (thanks). One question, however - the GC doco in
> the Python/C API seems to suggest to me that PyObject_GC_Track should be
> called on the newly minted object prior to returning from the initialiser
> (and correspondingly PyObject_GC_UnTrack should be called prior to
> dismantling). This isn't being done in the module as it stands. Is the
> module wrong, or is my understanding of the reference manual incorrect?

The purpose of "tracking" and "untracking" is to let cyclic gc know
when it (respectively) is and isn't safe to call an object's
tp_traverse method.  Primarily, when an object is first created at the
C level, it may contain NULLs or heap trash in pointer slots, and then
the object's tp_traverse could segfault if it were called while the
object remained in an insane (wrt tp_traverse) state.  Similarly,
cleanup actions in the tp_dealloc may make a tp_traverse-sane object
tp_traverse-insane, so tp_dealloc should untrack the object before
that occurs.

If tracking is never done, then the object effectively never
participates in cyclic gc:  its tp_traverse will never get called, and
it will effectively act as an external root (keeping itself and
everything reachable from it alive).  So, yes, track it during
construction, but not before all the members referenced by its
tp_traverse are in a sane state.  Putting the track call "at the end"
of the constructor is usually best practice.

tp_dealloc should untrack it then.  In a debug build, that will
assert-fail if the object hasn't actually been tracked. 
PyObject_GC_Del will untrack it for you (if it's still tracked), but
it's risky to rely on that --  it's too easy to forget that Py_DECREFs
on contained objects can end up executing arbitrary Python code (via
__del__ and weakref callbacks, and via allowing other threads to run),
which can in turn trigger a round of cyclic gc *while* your tp_dealloc
is still running.  So it's safest to untrack the object very early in
tp_dealloc.

I doubt this happens in the csv module, but an untrack/track pair
should also be put around any block of method code that temporarily
puts the object into a tp_traverse-insane state and that contains any
C API calls that may end up triggering cyclic gc.  That's very rare.


More information about the Python-Dev mailing list