> Tim tells Guido again that he finds the Java rules bad, slinging some
> mud at Guy Steele, but without explaining what the problem with them
> is ...
Slinging mud? Let's back off here. You've read the Java spec and were
impressed. That's fine -- it is impressive <wink>. But go on from
there and see where it leads in practice. That Java's GC model did a
masterful job but includes a finalization model users dislike is really
just conventional wisdom in the Java world. My sketch of Guy Steele's
involvement was an attempt to explain why both halves of that are valid.
I didn't think "explaining the problem" was necessary, as it's been
covered in depth multiple times in c.l.py threads, by Java programmers
as well as by me. Searching the web for articles about this turns up
many; the first one I hit is typical:
Consequently we recommend that [Java] programmers support but do
not rely on finalization. That is, place all finalization semantics
in finalize() methods, but call those methods explicitly and in the
order required. The points below provide more detail.
That's par for the Java course: advice to write finalizers to survive
being called multiple times, call them explicitly, and do all you can
to ensure that the "by magic" call is a nop. The lack of ordering
rules in the language forces people to "do it by hand" (as the Java
spec acknowledges: "It is straightforward to implement a Java class
that will cause a set of finalizer-like methods to be invoked in a
specified order for a set of objects when all the objects become
unreachable. Defining such a class is left as an exercise for the
reader." But from what I've seen, that exercise is beyond the
imagination of most Java programmers! The perceived need for ordering
It's fine that you want to restrict finalizers to "simple" cases; it's
not so fine if the language can't ensure that simple cases are the only
ones the user can write, & can neither detect & complain at runtime
about cases it didn't intend to support. The Java spec is unhelpful
Therefore, we recommend that the design of finalize methods be kept
simple and that they be programmed defensively, so that they will
work in all cases.
Mom and apple pie, but what does it mean, exactly? The spec realizes
that you're going to be tempted to try things that won't work, but
can't really explain what those are in terms simpler than the full set
of implementation consequences. As a result, users hate it -- but
don't take my word for that! If you look & don't find that Java's
finalization rules are widely viewed as "a problem to be wormed around"
by serious Java programmers, fine -- then you've got a much better
search engine than mine <wink>.
As for why I claim following topsort rules is very likely to work out
better, they follow from the nature of the problem, and can be
explained as such, independent of implementation details. See the
Boehm reference for more about topsort.
will-personally-use-python-regardless-ly y'rs - tim
The following "problem" is easy to fix. However, what I wanted to know is
if people (Skip and Guido most importantly) think it is a problem:
>>> "a" in u"bbba"
>>> u"a" in "bbba"
Traceback (innermost last):
File "<stdin>", line 1, in ?
TypeError: string member test needs char left operand
Suggested fix: in stringobject.c, explicitly allow a unicode char left
Moshe Zadka <mzadka(a)geocities.com>.
David Chase maintains a well-regarded GC FAQ, at
Interested folks should look it up. A couple highlights:
On cycles with finalizers:
In theory, of course, a cycle in the graph of objects to be finalized
will prevent a topological sort from succeeding. In practice, the
"right" thing to do appears to be to signal an error (at least when
debugging) and let the programmer clean this up. People with experience
on large systems report that such cycles are in fact exceedingly rare
(note, however, that some languages define "finalizers" for almost
every object, and that was not the case for the large systems studied
-- there, finalizers were not too common).
On Java's "finalizer called only once" rule:
if an object is revived in finalization, that is fine, but its
finalizer will not run a second time. (It isn't clear if this is a
matter of design, or merely an accident of the first implementation
of the language, but it is in the specification now. Obviously, this
encourages careful use of finalization, in much the same way that
driving without seatbelts encourages careful driving.)
Until today, I had no idea I was so resolutely conventional <wink>.
y'rs - tim
Eiffel is Bertrand Meyer's "design by contract" OO language. Meyer took
extreme care in its design, and has written extensively and articulately
about the design -- agree with him or not, he's always worth reading!
I used Eiffel briefly a few years ago, just out of curiosity. I didn't
recall even bumping into a notion of destructors. Turns out it does have
them, but they're appallingly (whether relative to Eiffel's usual clarity,
or even relative to C++'s usual lack thereof <0.9 wink>) ill-specified.
An Eiffel class can register a destructor by inheriting from the system
MEMORY class and overriding the latter's "dispose()". This appears to be
viewed as a low-level facility, and neither OOSC (2nd ed) nor "Eiffel: The
Language" say much about its semantics. Within dispose, you're explicitly
discouraged from invoking methods on *any* other object, and resurrection is
right out the window. But the language doesn't appear to check for any of
that, which is extremely un-Eiffel-like. Many msgs on comp.lang.eiffel from
people who should know suggest that all but one Eiffel implementation pay no
attention at all to reachability during gc, and that none support
resurrection. If you need ordering during finalization, the advice is to
write that part in C/C++. Violations of the vague rules appear to lead to
random system damage(!).
Looking at various Eiffel pkgs on the web, the sole use of dispose was in
one-line bodies that released external resources (like memory & db
connections) via calling an external C/C++ function.
jealous-&-appalled-at-the-same-time<wink>-ly y'rs - tim
Folks, let's not forget that python-dev is a place where oftentimes
half-baked ideas will get advanced. I came up with an idea about decoupling
error handling from exception message strings. I don't expect my idea to be
adopted as is. Similarly, Barry's ideas about object timestamps were
admittedly conceived late at night in the thrill following an apparently
good gig. (I like the idea that every object has a modtime, but for other
reasons than Barry suggested.)
My feeling is that bad ideas will get winnowed out or drastically modified
quickly enough anyway. Think of these early ideas as little more than
brainstorms. A lot of times if I have an idea, I feel I need to put it down
on my virtual whiteboard quickly, because a) I often don't have a lot of
time to pursue stuff (do it now or it won't get done), b) because bad ideas
can be the catalyst for better ideas, and c) if I don't do it immediately,
I'll probably forget the idea altogether, thus missing the opportunity for
reason b altogether.
Try and collect a bunch of ideas before shooting any down and see what falls
out. The best ideas will survive. When people start proving things and
using fancy diagrams like "a <=> b -> C", then go ahead and get picky... ;-)
Have a relaxing, thought provoking weekend. I'm going to go see a movie
this evening with my wife and youngest son, appropriately enough titled, "My
Dog Skip". Enough Pythoneering for one day...
Skip Montanaro | http://www.mojam.com/
skip(a)mojam.com | http://www.musi-cal.com/
I would love to test the Python 1.6 (Unicode support) in Chinese language
aspect, but I don't know where I can get a copy of OS that supports
Chinese. Anyone can point me a direction?
From: Guido van Rossum [SMTP:email@example.com]
Sent: Saturday, March 11, 2000 12:20 AM
To: Python mailing list; python-announce(a)python.org; python-dev(a)python.org;
Cc: Marc-Andre Lemburg
Subject: Unicode patches checked in
I've just checked in a massive patch from Marc-Andre Lemburg which
adds Unicode support to Python. This work was financially supported
by Hewlett-Packard. Marc-Andre has done a tremendous amount of work,
for which I cannot thank him enough.
We're still awaiting some more things: Marc-Andre gave me
documentation patches which will be reviewed by Fred Drake before they
are checked in; Fredrik Lundh has developed a new regular expression
which is Unicode-aware and which should be checked in real soon now.
Also, the documentation is probably incomplete and will be updated,
and of course there may be bugs -- this should be considered alpha
software. However, I believe it is quite good already, otherwise I
wouldn't have checked it in!
I'd like to invite everyone with an interest in Unicode or Python 1.6
to check out this new Unicode-aware Python, so that we can ensure a
robust code base by the time Python 1.6 is released (planned release
date: June 1, 2000). The download links are below.
Instructions on how to get access to the CVS version.
(David Ascher is making nightly tarballs of the CVS version
available at http://starship.python.net/crew/da/pythondists/)
The latest version of the specification on which the Marc
has based his implementation.
Home page of the i18n-sig (Internationalization SIG), which has
lots of other links about this and related issues.
The Python Bugs List. Use this for all bug reports.
Note that next Tuesday I'm going on a 10-day trip, with limited time
to read email and no time to solve problems. The usual crowd will
take care of urgent updates. See you at the Intel Computing Continuum
Conference in San Francisco or at the Python Track at Software
Development 2000 in San Jose!
--Guido van Rossum (home page: http://www.python.org/~guido/)
Christian Tismer just did an exhaustive search for thread unsafe use
of Python operations, and found two weaknesses. One is
posix.listdir(), which I had already found; the other is
file.writelines(). Here's a program that demonstrates the bug;
basically, while writelines is walking down the list, another thread
could truncate the list, causing PyList_GetItem() to fail or a string
object to be deallocated while writelines is using it. On my SOlaris
7 system it typically crashes in the first or second iteration.
It's easy to fix: just don't use release the interpreter lock (get rid
of Py_BEGIN_ALLOW_THREADS c.s.). This would however prevent other
threads from doing any work while this thread may be blocked for I/O.
An alternative solution is to put Py_BEGIN_ALLOW_THREADS and
Py_END_ALLOW_THREADS just around the fwrite() call. This is safe, but
would require a lot of lock operations and would probably slow things
down too much.
--Guido van Rossum (home page: http://www.python.org/~guido/)
def good_guy(fp, list):
t0 = time.time()
t1 = time.time()
print fp.tell(), "bytes written"
def bad_guy(dt, list):
time.sleep(random.random() * dt)
infn = "/usr/dict/words"
infn = sys.argv
print "reading %s..." % infn
fp = open(infn)
list = fp.readlines()
print "read %d lines" % len(list)
tfn = tempfile.mktemp()
fp = None
fp = open(tfn, "w")
dt = 0.0
n = 3
for i in range(n):
dt = dt + good_guy(fp, list)
dt = dt / n # average time it took to write the list to disk
print "dt =", round(dt, 3)
i = 0
i = i+1
print "test", i
copy = map(lambda x: x[1:], list)
thread.start_new_thread(bad_guy, (dt, copy))
[Tim, explaining something I was thinking about more clearly than
I ever could]
>It's not obvious, but the SCCs can be found in linear time (via Tarjan's
>algorithm, which is simple but subtle;
Wow, it seems like it should be more expensive than that. What
are the space requirements? Also, does the simple algorithm you
used in Cyclops have a name?
>If there are no safe nodes without predecessors, GC is stuck,
>and for good reason: every object in the whole pile is reachable
>from an object with a finalizer, which could change the topology
>in near-arbitrary ways. The unsafe nodes without predecessors
>(and again, by #4, there must be at least one) are the heart of
>the problem, and this scheme identifies them precisely.
Exactly. What is our policy on these unsafe nodes? Guido seems
to feel that it is okay for the programmer to create them and
Python should have a way of collecting them. Tim seems to feel
that the programmer should not create them in the first place. I
agree with Tim.
If topological finalization is used, it is possible for the
programmer to design their classes so that this problem does not
happen. This is explained on Hans Boehm's finalization web page.
If the programmer can or does not redesign their classes I don't
think it is unreasonable to leak memory. We can link these
cycles to a global list of garbage or print a debugging message.
This is a large improvement over the current situation (ie.
leaking memory with no debugging even for cycles without
"If you're a great programmer, you make all the routines depend on each
other, so little mistakes can really hurt you." -- Bill Gates, ca. 1985.
Warning: long message. If you're not interested in reading all this,
please skip to "Conclusion" at the end.
At Tim's recommendation I had a look at what section 12.6 of the Java
language spec says about finalizers. The stuff there is sure seductive
for language designers...
Have a look at te diagram at
http://java.sun.com/docs/books/jls/html/12.doc.html#48746. In all its
(seeming) complexity, it helped me understand some of the issues of
finalization better. Rather than the complex 8-state state machine
that it appears to be, think of it as a simple 3x3 table. The three
rows represent the categories reachable, finalizer-reachable
(abbreviated in the diagram as f-reachable), and unreachable. These
categories correspond directly to categories of objects that the
Schemenauer-Tiedemann cycle-reclamation scheme deals with: after
moving all the reachable objects to the second list (first the roots
and then the objects reachable from the roots), the first list is left
with the unreachable and finalizer-reachable objects.
If we want to distinguish between unreachable and finalizer-reachable
at this point, a straightforward application of the same algorithm
will work well: Create a third list (this will contain the
finalizer-reachable objects). Start by filling it with all the objects
from the first list (which contains the potential garbage at this
point) that have a finalizer. We can look for objects that have
__del__ or __clean__ or for which tp_clean(CARE_EXEC)==true, it
doesn't matter here.(*) Then walk through the third list, following
each object's references, and move all referenced objects that are
still in the first list to the third list. Now, we have:
List 1: truly unreachable objects. These have no finalizers and can be
discarded right away.
List 2: truly reachable objects. (Roots and objects reachable from
roots.) Leave them alone.
List 3: finalizer-reachable objects. This contains objects that are
unreachable but have a finalizer, and objects that are only reachable
We now have to decide on a policy for invoking finalizers. Java
suggests the following: Remember the "roots" of the third list -- the
nodes that were moved there directly from the first list because they
have a finalizer. These objects are marked *finalizable* (a category
corresponding to the second *column* of the Java diagram). The Java
spec allows the Java garbage collector to call all of these finalizers
in any order -- even simultaneously in separate threads. Java never
allows an object to go back from the finalizable to the unfinalized
state (there are no arrows pointing left in the diagram). The first
finalizer that is called could make its object reachable again (up
arrow), thereby possibly making other finalizable objects reachable
too. But this does not cancel their scheduled finalization! The
conclusion is that Java can sometimes call finalization on unreachable
objects -- but only if those objects have gone through a phase in
their life where they were unreachable or at least
I agree that this is the best that Java can do: if there are cycles
containing multiple objects with finalizers, there is no way (short of
asking the programmer(s)) to decide which object to finalize first. We
could pick one at random, run its finalizer, and start garbage
collection all over -- if the finalizer doesn't resurrect anything,
this will give us the same set of unreachable objects, from which we
could pick the next finalizable object, and so on. That looks very
inefficient, might not terminate (the same object could repeatedly
show up as the candidate for finalization), and it's still arbitrary:
the programmer(s) still can't predict which finalizer in a cycle with
multiple finalizers will be called first. Assuming the recommended
characteristics of finalizers (brief and robust), it won't make much
difference if we call all finalizers (of the now-finalizeable objects)
"without looking back". Sure, some objects may find themselves in a
perfectly reachable position with their finalizer called -- but they
did go through a "near-death experience". I don't find this
objectionable, and I don't see how Java could possibly do better for
cycles with multiple finalizers.
Now let's look again at the rule that an object's finalizer will be
called at most once automatically by the garbage collector. The
transitions between the colums of the Java diagram enforce this: the
columns are labeled from left to right with unfinalized, finalizable,
and finalized, and there are no transition arrows pointing left. (In
my description above, already finalized objects are considered not to
have a finalizer.) I think this rule makes a lot of sense given Java's
multi-threaded garbage collection: the invocation of finalizers could
run concurreltly with another garbage collection, and we don't want
this to find some of the same finalizable objects and call their
We could mark them with a "finalization in progress" flag only while
their finalizer is running, but in a cycle with multiple finalizers it
seems we should keep this flag set until *all* finalizers for objects
in the cycle have run. But we don't actually know exactly what the
cycles are: all we know is "these objects are involved in trash
cycles". More detailed knowledge would require yet another sweep, plus
a more hairy two-dimensional data structure (a list of separate
cycles). And for what? as soon as we run finalizers from two separate
cycles, those cycles could be merged again (e.g. the first finalizer
could resurrect its cycle, and the second one could link to it). Now
we have a pool of objects that are marked "finalization in progress"
until all their finalizations terminate. For an incremental concurrent
garbage collector, this seems a pain, since it may continue to find
new finalizable objects and add them to the pile. Java takes the
logical conclusion: the "finalization in progress" flag is never
cleared -- and renamed to "finalized".
Are the Java rules complex? Yes. Are there better rules possible? I'm
not so sure, given the requirement of allowing concurrent incremental
garbage collection algorithms that haven't even been invented
yet. (Plus the implied requirement that finalizers in trash cycles
should be invoked.) Are the Java rules difficult for the user? Only
for users who think they can trick finalizers into doing things for
them that they were not designed to do. I would think the following
guidelines should do nicely for the rest of us:
1. Avoid finalizers if you can; use them only to release *external*
(e.g. OS) resources.
2. Write your finalizer as robust as you can, with as little use of
other objects as you can.
3. Your only get one chance. Use it.
Unlike Scheme guardians or the proposed __cleanup__ mechanism, you
don't have to know whether your object is involved in a cycle -- your
finalizer will still be called.
I am reconsidering to use the __del__ method as the finalizer. As a
compromise to those who want their __del__ to run whenever the
reference count reaches zero, the finalized flag can be cleared
explicitly. I am considering to use the following implementation:
after retrieving the __del__ method, but before calling it,
self.__del__ is set to None (better, self.__dict__['__del__'] = None,
to avoid confusing __setattr__ hooks). The object call remove
self.__del__ to clear the finalized flag. I think I'll use the same
mechanism to prevent __del__ from being called upon a failed
Final note: the semantics "__del__ is called whenever the reference
count reaches zero" cannot be defended in the light of a migration to
different forms of garbage collection (e.g. JPython). There may not
be a reference count.
--Guido van Rossum (home page: http://www.python.org/~guido/)
(*) Footnote: there's one complication: to ask a Python class instance
if it has a finalizer, we have to use PyObject_Getattr(obj, ...). If
the object's class has a __getattr__ hook, this can invoke arbitrary
Python code -- even if the answer to the question is "no"! This can
make the object reachable again (in the Java diagram, arrows pointing
up or up and right). We could either use instance_getattr1(), which
avoids the __getattr__ hook, or mark all class instances as
finalizable until proven innocent.
Before starting to code away, I would like to know which
of the new Unicode methods should also be available on
Here are the currently available methods:
Unicode objects string objects
translate translate (*)
(*) The two hvae slightly different implementations, e.g.
deletions are handled differently.
Python Pages: http://www.lemburg.com/python/