[Python-Dev] C API for gc.enable() and gc.disable()
tjreedy at udel.edu
Sat Jun 21 20:40:16 CEST 2008
Kevin Jacobs <jacobs at bioinformed.com> wrote:
> I can say with complete certainty that of the 20+ programmers I've had
> working for me, many who have used Python for 3+ years, not a single one
> would think to question the garbage collector if they observed the kind
> of quadratic time complexity I've demonstrated. This is not because
> they are stupid, but because they have only a vague idea that Python
> even has a garbage collector, never mind that it could be behaving badly
> for such innocuous looking code.
As I understand it, gc is needed now more that ever because new style
classes make reference cycles more common. On the other hand, greatly
increased RAM size (from some years ago) makes megaobject bursts
possible. Such large bursts move the hidden quadratic do-nothing drag
out of the relatively flat part of the curve (total time just double or
triple what it should be) to where it can really bite. Leaving aside
what you do for your local group, can we better warn Python programmers
now, for the upcoming 2.5, 2.6, and 3.0 releases?
Paragraph 3 of the Reference Manual chapter on Data Model(3.0 version) says:
"Objects are never explicitly destroyed; however, when they become
unreachable they may be garbage-collected. An implementation is allowed
to postpone garbage collection or omit it altogether — it is a matter of
implementation quality how garbage collection is implemented, as long as
no objects are collected that are still reachable. (Implementation note:
the current implementation uses a reference-counting scheme with
(optional) delayed detection of cyclically linked garbage, which
collects most objects as soon as they become unreachable, but is not
guaranteed to collect garbage containing circular references. See the
documentation of the gc module for information on controlling the
collection of cyclic garbage.)"
I am not sure what to add here, (especially for those who do not read it;-).
The Library Manual gc section says "Since the collector supplements the
reference counting already used in Python, you can disable the collector
if you are sure your program does not create reference cycles." Perhaps
it should also say "You should disable when creating millions of
objects without cycles".
The installed documentation set (on Windows, at least) include some
Python HOWTOs. If one were added on Space Management (implementations,
problems, and solutions), would your developers read it?
> Maybe we should consider more carefully before declaring the status quo
> sufficient. Average developers do allocate millions of objects in
> bursts and super-linear time complexity for such operations is not
> acceptable. Thankfully I am around to help my programmers work around
> such issues or else they'd be pushing to switch to Java, Ruby, C#, or
> whatever since Python was inexplicably "too slow" for "real work". This
> being open source, I'm certainly willing to help in the effort to do so,
> but not if potential solutions will be ruled out as being unnecessary.
To me, 'sufficient' (time-dependent) and 'necessary' are either too
vague or too strict to being about what you want -- change. This is
the third thread I have read (here + c.l.p) on default-mode gc problems
(but all in the last couple of years or so). So, especially with the
nice table someone posted recently, on time with and without gc, and
considering that installed RAM continues to grow, I am persuaded that
default behavior improvement that does not negatively impact the vast
majority would be desirable.
Terry Jan Reedy
More information about the Python-Dev