Mailman 3 March 2000 - Python-Dev

Re: [Python-Dev] finalization again
by Tim_Peters＠Dragonsys.com 11 Mar '00

11 Mar '00

[Guido] > Tim tells Guido again that he finds the Java rules bad, slinging some > mud at Guy Steele, but without explaining what the problem with them > is ... Slinging mud? Let's back off here. You've read the Java spec and were impressed. That's fine -- it is impressive <wink>. But go on from there and see where it leads in practice. That Java's GC model did a masterful job but includes a finalization model users dislike is really just conventional wisdom in the Java world. My sketch of Guy Steele's involvement was an attempt to explain why both halves of that are valid. I didn't think "explaining the problem" was necessary, as it's been covered in depth multiple times in c.l.py threads, by Java programmers as well as by me. Searching the web for articles about this turns up many; the first one I hit is typical: http://www.quoininc.com/quoininc/Design_Java0197.html eventually concludes Consequently we recommend that [Java] programmers support but do not rely on finalization. That is, place all finalization semantics in finalize() methods, but call those methods explicitly and in the order required. The points below provide more detail. That's par for the Java course: advice to write finalizers to survive being called multiple times, call them explicitly, and do all you can to ensure that the "by magic" call is a nop. The lack of ordering rules in the language forces people to "do it by hand" (as the Java spec acknowledges: "It is straightforward to implement a Java class that will cause a set of finalizer-like methods to be invoked in a specified order for a set of objects when all the objects become unreachable. Defining such a class is left as an exercise for the reader." But from what I've seen, that exercise is beyond the imagination of most Java programmers! The perceived need for ordering is not.). It's fine that you want to restrict finalizers to "simple" cases; it's not so fine if the language can't ensure that simple cases are the only ones the user can write, & can neither detect & complain at runtime about cases it didn't intend to support. The Java spec is unhelpful here too: Therefore, we recommend that the design of finalize methods be kept simple and that they be programmed defensively, so that they will work in all cases. Mom and apple pie, but what does it mean, exactly? The spec realizes that you're going to be tempted to try things that won't work, but can't really explain what those are in terms simpler than the full set of implementation consequences. As a result, users hate it -- but don't take my word for that! If you look & don't find that Java's finalization rules are widely viewed as "a problem to be wormed around" by serious Java programmers, fine -- then you've got a much better search engine than mine <wink>. As for why I claim following topsort rules is very likely to work out better, they follow from the nature of the problem, and can be explained as such, independent of implementation details. See the Boehm reference for more about topsort. will-personally-use-python-regardless-ly y'rs - tim

9 17

Unicode: When Things Get Hairy
by Moshe Zadka 11 Mar '00

11 Mar '00

The following "problem" is easy to fix. However, what I wanted to know is if people (Skip and Guido most importantly) think it is a problem: >>> "a" in u"bbba" 1 >>> u"a" in "bbba" Traceback (innermost last): File "<stdin>", line 1, in ? TypeError: string member test needs char left operand Suggested fix: in stringobject.c, explicitly allow a unicode char left operand. -- Moshe Zadka <mzadka(a)geocities.com>. http://www.oreilly.com/news/prescod_0300.html

3 5

Conventional wisdom on finalization
by Tim Peters 10 Mar '00

10 Mar '00

David Chase maintains a well-regarded GC FAQ, at http://www.iecc.com/gclist/GC-faq.html Interested folks should look it up. A couple highlights: On cycles with finalizers: In theory, of course, a cycle in the graph of objects to be finalized will prevent a topological sort from succeeding. In practice, the "right" thing to do appears to be to signal an error (at least when debugging) and let the programmer clean this up. People with experience on large systems report that such cycles are in fact exceedingly rare (note, however, that some languages define "finalizers" for almost every object, and that was not the case for the large systems studied -- there, finalizers were not too common). On Java's "finalizer called only once" rule: if an object is revived in finalization, that is fine, but its finalizer will not run a second time. (It isn't clear if this is a matter of design, or merely an accident of the first implementation of the language, but it is in the specification now. Obviously, this encourages careful use of finalization, in much the same way that driving without seatbelts encourages careful driving.) Until today, I had no idea I was so resolutely conventional <wink>. seems-we're-trying-to-do-more-than-anyone-other-than-us-expects-ly y'rs - tim

1 0

Finalization in Eiffel
by Tim Peters 10 Mar '00

10 Mar '00

Eiffel is Bertrand Meyer's "design by contract" OO language. Meyer took extreme care in its design, and has written extensively and articulately about the design -- agree with him or not, he's always worth reading! I used Eiffel briefly a few years ago, just out of curiosity. I didn't recall even bumping into a notion of destructors. Turns out it does have them, but they're appallingly (whether relative to Eiffel's usual clarity, or even relative to C++'s usual lack thereof <0.9 wink>) ill-specified. An Eiffel class can register a destructor by inheriting from the system MEMORY class and overriding the latter's "dispose()". This appears to be viewed as a low-level facility, and neither OOSC (2nd ed) nor "Eiffel: The Language" say much about its semantics. Within dispose, you're explicitly discouraged from invoking methods on *any* other object, and resurrection is right out the window. But the language doesn't appear to check for any of that, which is extremely un-Eiffel-like. Many msgs on comp.lang.eiffel from people who should know suggest that all but one Eiffel implementation pay no attention at all to reachability during gc, and that none support resurrection. If you need ordering during finalization, the advice is to write that part in C/C++. Violations of the vague rules appear to lead to random system damage(!). Looking at various Eiffel pkgs on the web, the sole use of dispose was in one-line bodies that released external resources (like memory & db connections) via calling an external C/C++ function. jealous-&-appalled-at-the-same-time<wink>-ly y'rs - tim

1 0

on the suitability of ideas tossed out to python-dev
by Skip Montanaro 10 Mar '00

10 Mar '00

Folks, let's not forget that python-dev is a place where oftentimes half-baked ideas will get advanced. I came up with an idea about decoupling error handling from exception message strings. I don't expect my idea to be adopted as is. Similarly, Barry's ideas about object timestamps were admittedly conceived late at night in the thrill following an apparently good gig. (I like the idea that every object has a modtime, but for other reasons than Barry suggested.) My feeling is that bad ideas will get winnowed out or drastically modified quickly enough anyway. Think of these early ideas as little more than brainstorms. A lot of times if I have an idea, I feel I need to put it down on my virtual whiteboard quickly, because a) I often don't have a lot of time to pursue stuff (do it now or it won't get done), b) because bad ideas can be the catalyst for better ideas, and c) if I don't do it immediately, I'll probably forget the idea altogether, thus missing the opportunity for reason b altogether. Try and collect a bunch of ideas before shooting any down and see what falls out. The best ideas will survive. When people start proving things and using fancy diagrams like "a <=> b -> C", then go ahead and get picky... ;-) Have a relaxing, thought provoking weekend. I'm going to go see a movie this evening with my wife and youngest son, appropriately enough titled, "My Dog Skip". Enough Pythoneering for one day... bow-wow-ly y'rs, Skip Montanaro | http://www.mojam.com/ skip(a)mojam.com | http://www.musi-cal.com/

1 0

RE: Unicode patches checked in
by Shichang Zhao 10 Mar '00

10 Mar '00

I would love to test the Python 1.6 (Unicode support) in Chinese language aspect, but I don't know where I can get a copy of OS that supports Chinese. Anyone can point me a direction? -----Original Message----- From: Guido van Rossum [SMTP:guido@python.org] Sent: Saturday, March 11, 2000 12:20 AM To: Python mailing list; python-announce(a)python.org; python-dev(a)python.org; i18n-sig(a)python.org; string-sig(a)python.org Cc: Marc-Andre Lemburg Subject: Unicode patches checked in I've just checked in a massive patch from Marc-Andre Lemburg which adds Unicode support to Python. This work was financially supported by Hewlett-Packard. Marc-Andre has done a tremendous amount of work, for which I cannot thank him enough. We're still awaiting some more things: Marc-Andre gave me documentation patches which will be reviewed by Fred Drake before they are checked in; Fredrik Lundh has developed a new regular expression which is Unicode-aware and which should be checked in real soon now. Also, the documentation is probably incomplete and will be updated, and of course there may be bugs -- this should be considered alpha software. However, I believe it is quite good already, otherwise I wouldn't have checked it in! I'd like to invite everyone with an interest in Unicode or Python 1.6 to check out this new Unicode-aware Python, so that we can ensure a robust code base by the time Python 1.6 is released (planned release date: June 1, 2000). The download links are below. Links: http://www.python.org/download/cvs.html Instructions on how to get access to the CVS version. (David Ascher is making nightly tarballs of the CVS version available at http://starship.python.net/crew/da/pythondists/) http://starship.python.net/crew/lemburg/unicode-proposal.txt The latest version of the specification on which the Marc has based his implementation. http://www.python.org/sigs/i18n-sig/ Home page of the i18n-sig (Internationalization SIG), which has lots of other links about this and related issues. http://www.python.org/search/search_bugs.html The Python Bugs List. Use this for all bug reports. Note that next Tuesday I'm going on a 10-day trip, with limited time to read email and no time to solve problems. The usual crowd will take care of urgent updates. See you at the Intel Computing Continuum Conference in San Francisco or at the Python Track at Software Development 2000 in San Jose! --Guido van Rossum (home page: http://www.python.org/~guido/) -- http://www.python.org/mailman/listinfo/python-list

1 0

writelines() not thread-safe
by Guido van Rossum 10 Mar '00

10 Mar '00

Christian Tismer just did an exhaustive search for thread unsafe use of Python operations, and found two weaknesses. One is posix.listdir(), which I had already found; the other is file.writelines(). Here's a program that demonstrates the bug; basically, while writelines is walking down the list, another thread could truncate the list, causing PyList_GetItem() to fail or a string object to be deallocated while writelines is using it. On my SOlaris 7 system it typically crashes in the first or second iteration. It's easy to fix: just don't use release the interpreter lock (get rid of Py_BEGIN_ALLOW_THREADS c.s.). This would however prevent other threads from doing any work while this thread may be blocked for I/O. An alternative solution is to put Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS just around the fwrite() call. This is safe, but would require a lot of lock operations and would probably slow things down too much. Ideas? --Guido van Rossum (home page: http://www.python.org/~guido/) import os import sys import thread import random import time import tempfile def good_guy(fp, list): t0 = time.time() fp.seek(0) fp.writelines(list) t1 = time.time() print fp.tell(), "bytes written" return t1-t0 def bad_guy(dt, list): time.sleep(random.random() * dt) del list[:] def main(): infn = "/usr/dict/words" if sys.argv[1:]: infn = sys.argv[1] print "reading %s..." % infn fp = open(infn) list = fp.readlines() fp.close() print "read %d lines" % len(list) tfn = tempfile.mktemp() fp = None try: fp = open(tfn, "w") print "calibrating..." dt = 0.0 n = 3 for i in range(n): dt = dt + good_guy(fp, list) dt = dt / n # average time it took to write the list to disk print "dt =", round(dt, 3) i = 0 while 1: i = i+1 print "test", i copy = map(lambda x: x[1:], list) thread.start_new_thread(bad_guy, (dt, copy)) good_guy(fp, copy) finally: if fp: fp.close() try: os.unlink(tfn) except os.error: pass main()

2 2

finalization again
by nascheme＠enme.ucalgary.ca 09 Mar '00

09 Mar '00

[Tim, explaining something I was thinking about more clearly than I ever could] >It's not obvious, but the SCCs can be found in linear time (via Tarjan's >algorithm, which is simple but subtle; Wow, it seems like it should be more expensive than that. What are the space requirements? Also, does the simple algorithm you used in Cyclops have a name? >If there are no safe nodes without predecessors, GC is stuck, >and for good reason: every object in the whole pile is reachable >from an object with a finalizer, which could change the topology >in near-arbitrary ways. The unsafe nodes without predecessors >(and again, by #4, there must be at least one) are the heart of >the problem, and this scheme identifies them precisely. Exactly. What is our policy on these unsafe nodes? Guido seems to feel that it is okay for the programmer to create them and Python should have a way of collecting them. Tim seems to feel that the programmer should not create them in the first place. I agree with Tim. If topological finalization is used, it is possible for the programmer to design their classes so that this problem does not happen. This is explained on Hans Boehm's finalization web page. If the programmer can or does not redesign their classes I don't think it is unreasonable to leak memory. We can link these cycles to a global list of garbage or print a debugging message. This is a large improvement over the current situation (ie. leaking memory with no debugging even for cycles without finalizers). Neil -- "If you're a great programmer, you make all the routines depend on each other, so little mistakes can really hurt you." -- Bill Gates, ca. 1985.

4 4

finalization again
by Guido van Rossum 07 Mar '00

07 Mar '00

Warning: long message. If you're not interested in reading all this, please skip to "Conclusion" at the end. At Tim's recommendation I had a look at what section 12.6 of the Java language spec says about finalizers. The stuff there is sure seductive for language designers... Have a look at te diagram at http://java.sun.com/docs/books/jls/html/12.doc.html#48746. In all its (seeming) complexity, it helped me understand some of the issues of finalization better. Rather than the complex 8-state state machine that it appears to be, think of it as a simple 3x3 table. The three rows represent the categories reachable, finalizer-reachable (abbreviated in the diagram as f-reachable), and unreachable. These categories correspond directly to categories of objects that the Schemenauer-Tiedemann cycle-reclamation scheme deals with: after moving all the reachable objects to the second list (first the roots and then the objects reachable from the roots), the first list is left with the unreachable and finalizer-reachable objects. If we want to distinguish between unreachable and finalizer-reachable at this point, a straightforward application of the same algorithm will work well: Create a third list (this will contain the finalizer-reachable objects). Start by filling it with all the objects from the first list (which contains the potential garbage at this point) that have a finalizer. We can look for objects that have __del__ or __clean__ or for which tp_clean(CARE_EXEC)==true, it doesn't matter here.(*) Then walk through the third list, following each object's references, and move all referenced objects that are still in the first list to the third list. Now, we have: List 1: truly unreachable objects. These have no finalizers and can be discarded right away. List 2: truly reachable objects. (Roots and objects reachable from roots.) Leave them alone. List 3: finalizer-reachable objects. This contains objects that are unreachable but have a finalizer, and objects that are only reachable through those. We now have to decide on a policy for invoking finalizers. Java suggests the following: Remember the "roots" of the third list -- the nodes that were moved there directly from the first list because they have a finalizer. These objects are marked *finalizable* (a category corresponding to the second *column* of the Java diagram). The Java spec allows the Java garbage collector to call all of these finalizers in any order -- even simultaneously in separate threads. Java never allows an object to go back from the finalizable to the unfinalized state (there are no arrows pointing left in the diagram). The first finalizer that is called could make its object reachable again (up arrow), thereby possibly making other finalizable objects reachable too. But this does not cancel their scheduled finalization! The conclusion is that Java can sometimes call finalization on unreachable objects -- but only if those objects have gone through a phase in their life where they were unreachable or at least finalizer-unreachable. I agree that this is the best that Java can do: if there are cycles containing multiple objects with finalizers, there is no way (short of asking the programmer(s)) to decide which object to finalize first. We could pick one at random, run its finalizer, and start garbage collection all over -- if the finalizer doesn't resurrect anything, this will give us the same set of unreachable objects, from which we could pick the next finalizable object, and so on. That looks very inefficient, might not terminate (the same object could repeatedly show up as the candidate for finalization), and it's still arbitrary: the programmer(s) still can't predict which finalizer in a cycle with multiple finalizers will be called first. Assuming the recommended characteristics of finalizers (brief and robust), it won't make much difference if we call all finalizers (of the now-finalizeable objects) "without looking back". Sure, some objects may find themselves in a perfectly reachable position with their finalizer called -- but they did go through a "near-death experience". I don't find this objectionable, and I don't see how Java could possibly do better for cycles with multiple finalizers. Now let's look again at the rule that an object's finalizer will be called at most once automatically by the garbage collector. The transitions between the colums of the Java diagram enforce this: the columns are labeled from left to right with unfinalized, finalizable, and finalized, and there are no transition arrows pointing left. (In my description above, already finalized objects are considered not to have a finalizer.) I think this rule makes a lot of sense given Java's multi-threaded garbage collection: the invocation of finalizers could run concurreltly with another garbage collection, and we don't want this to find some of the same finalizable objects and call their finalizers again! We could mark them with a "finalization in progress" flag only while their finalizer is running, but in a cycle with multiple finalizers it seems we should keep this flag set until *all* finalizers for objects in the cycle have run. But we don't actually know exactly what the cycles are: all we know is "these objects are involved in trash cycles". More detailed knowledge would require yet another sweep, plus a more hairy two-dimensional data structure (a list of separate cycles). And for what? as soon as we run finalizers from two separate cycles, those cycles could be merged again (e.g. the first finalizer could resurrect its cycle, and the second one could link to it). Now we have a pool of objects that are marked "finalization in progress" until all their finalizations terminate. For an incremental concurrent garbage collector, this seems a pain, since it may continue to find new finalizable objects and add them to the pile. Java takes the logical conclusion: the "finalization in progress" flag is never cleared -- and renamed to "finalized". Conclusion ---------- Are the Java rules complex? Yes. Are there better rules possible? I'm not so sure, given the requirement of allowing concurrent incremental garbage collection algorithms that haven't even been invented yet. (Plus the implied requirement that finalizers in trash cycles should be invoked.) Are the Java rules difficult for the user? Only for users who think they can trick finalizers into doing things for them that they were not designed to do. I would think the following guidelines should do nicely for the rest of us: 1. Avoid finalizers if you can; use them only to release *external* (e.g. OS) resources. 2. Write your finalizer as robust as you can, with as little use of other objects as you can. 3. Your only get one chance. Use it. Unlike Scheme guardians or the proposed __cleanup__ mechanism, you don't have to know whether your object is involved in a cycle -- your finalizer will still be called. I am reconsidering to use the __del__ method as the finalizer. As a compromise to those who want their __del__ to run whenever the reference count reaches zero, the finalized flag can be cleared explicitly. I am considering to use the following implementation: after retrieving the __del__ method, but before calling it, self.__del__ is set to None (better, self.__dict__['__del__'] = None, to avoid confusing __setattr__ hooks). The object call remove self.__del__ to clear the finalized flag. I think I'll use the same mechanism to prevent __del__ from being called upon a failed initialization. Final note: the semantics "__del__ is called whenever the reference count reaches zero" cannot be defended in the light of a migration to different forms of garbage collection (e.g. JPython). There may not be a reference count. --Guido van Rossum (home page: http://www.python.org/~guido/) ____ (*) Footnote: there's one complication: to ask a Python class instance if it has a finalizer, we have to use PyObject_Getattr(obj, ...). If the object's class has a __getattr__ hook, this can invoke arbitrary Python code -- even if the answer to the question is "no"! This can make the object reachable again (in the Java diagram, arrows pointing up or up and right). We could either use instance_getattr1(), which avoids the __getattr__ hook, or mark all class instances as finalizable until proven innocent.

3 4

Adding Unicode methods to string objects
by M.-A. Lemburg 07 Mar '00

07 Mar '00

Before starting to code away, I would like to know which of the new Unicode methods should also be available on string objects. Here are the currently available methods: Unicode objects string objects ------------------------------------ capitalize capitalize center count count encode endswith endswith expandtabs find find index index isdecimal isdigit islower isnumeric isspace istitle isupper join join ljust lower lower lstrip lstrip replace replace rfind rfind rindex rindex rjust rstrip rstrip split split splitlines startswith startswith strip strip swapcase swapcase title title translate translate (*) upper upper zfill (*) The two hvae slightly different implementations, e.g. deletions are handled differently. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

3 5