Mailman 3 December 2007 - Python-ideas

Ideas towards GIL removal
by Greg Ewing 18 Jul '08

18 Jul '08

I've been thinking about some ideas for reducing the amount of refcount adjustment that needs to be done, with a view to making GIL removal easier. 1) Permanent objects In a typical Python program there are many objects that are created at the beginning and exist for the life of the program -- classes, functions, literals, etc. Refcounting these is a waste of effort, since they're never going to go away. So perhaps there could be a way of marking such objects as "permanent" or "immortal". Any refcount operation on a permanent object would be a no-op, so no locking would be needed. This would also have the benefit of eliminating any need to write to the object's memory at all when it's only being read. 2) Objects owned by a thread Python code creates and destroys temporary objects at a high rate -- stack frames, argument tuples, intermediate results, etc. If the code is executed by a thread, those objects are rarely if ever seen outside of that thread. It would be beneficial if refcount operations on such objects could be carried out by the thread that created them without locking. To achieve this, two extra fields could be added to the object header: an "owning thread id" and a "local reference count". (The existing refcount field will be called the "global reference count" in what follows.) An object created by a thread has its owning thread id set to that thread. When adjusting an object's refcount, if the current thread is the object's owning thread, the local refcount is updated without locking. If the object has no owning thread, or belongs to a different thread, the object is locked and the global refcount is updated. The object is considered garbage only when both refcounts drop to zero. Thus, after a decref, both refcounts would need to be checked to see if they are zero. When decrementing the local refcount and it reaches zero, the global refcount can be checked without locking, since a zero will never be written to it until it truly has zero non-local references remaining. I suspect that these two strategies together would eliminate a very large proportion of refcount-related activities requiring locking, perhaps to the point where those remaining are infrequent enough to make GIL removal practical. -- Greg

8 19

Anonymous functions for decorators
by Ryan Freckleton 26 Dec '07

26 Dec '07

I've been working with decorators to make a design by contract library. This has gotten me thinking about decorators and anonymous functions. The two use cases I'll target are callbacks and decorating descriptors. i.e. (from the twisted tutorial): class FingerProtocol(basic.LineReceiver): def lineReceived(self, user): self.factory.getUser(user ).addErrback(lambda _: "Internal error in server" ).addCallback(lambda m: (self.transport.write(m+"\r\n"), self.transport.loseConnection())) currently this can be modified to (ab)use a decorator: ... def lineReceived(self, user): @self.factory.getUser(user).addCallback def temp(_): return "Internal error in server" ... This is an abuse of decorators because it doesn't actually decorate the function temp, it registers it as a callback for the class FingerProtocol. And this use of decorators, which I mimic for my design by contract library. (Guido recently posted a patch on python-dev that would allow the property to be used in the following way): class C(object): x = property(doc="x coordinate") @x.setter def x(self, x): self.__x = x @x.getter def x(self): return self__x There was some concern voiced on the list about repeating the variable name both in the decorator and the function definition. In other languages, these use cases are handled by anonymous functions or blocks. Python also has an anonymous function, lambda, but it's limited to a single expression. There are parsing and readability issues with making lambda multi-line. My proposal would be to use a special syntax, perhaps a keyword, to allow decorators to take an anonymous function. @self.factory.getUser(user).addCallback def temp(_): return "Internal error in server" would become: @self.factory.getUser(user).addCallback do (_): return "Internal error in server" and: @x.setter def x(self, x): self.__x = x would become: @x.setter do (self, x): self.__x = x In general: @DECORATOR do (..args...): ..function would be the same as: def temp(..args..): ..function DECORATOR(temp) del temp The decorators should be stackable, as well so: @DEC2 @DEC1 do (..args..) ..function would be the same as: def temp(..args..): ..function DEC2(DEC1(temp)) del temp Other uses/abuses Perhaps this new 'do' block could be used to implement things like the following: @repeatUntilSuccess(times=10) do (): print "Attempting to connect." socket.connect() with repeatUntilSuccess implemented as: def repeatUntilSuccess(times=None): if times is None: def repeater(func): while True: try: func() break except Exception: continue else: def repeater(func): for i in range(times): try: func() break except Exception: continue return repeater @fork do (): time.sleep(10) print "This is done in another thread." thing.dostuff() with fork implemented as: def fork(func): t = threading.Thread(target=func) t.start() Open Issues: ~~~~~~~~~~~~ Doing a google search shows that the word 'do' is occasionally used as a function/method name. Same for synonyms act and perform. Reusing def would probably not be a good idea, because it could be easily confused with a normal decorator and function. This is still somewhat an abuse of the decorator syntax, it isn't adding annotations to the anonymous function, so it isn't really 'decorating' it. I do think it's better than using a function that returns None as a decorator. While scanning source, the 'do' keyword may be easily confused with def func. I'm not really sure what would work better, suggestions are welcome. Perhaps I'm going down the wrong path by reusing decorator syntax. Other possible keywords instead of 'do': anonymous, function, act, perform. I first thought of reusing the keyword lambda, but I think the semantics of lambda are too different to be used in this case. Similarly, the keyword with has different semantics, as well. Thanks, -- ===== --Ryan E. Freckleton

2 1

Democratic multiple dispatch doomed to fail (probably)
by Arnaud Delobelle 17 Dec '07

17 Dec '07

(Sorry I can't in-reply-to I've lost the original message) Neil Toronto ntoronto at cs.byu.edu: > While kicking around ideas for multiple dispatch, I came across one of > Perl's implementations, and it reminded me of something Larry Wall said > recently in his State of the Onion address about how types get together > and democratically select a function to dispatch to. I immediately > thought of Arrow's Theorem. Maybe I'm wrong, but I don't think this theorem applies in the case of mulitple dispatch. The type of the argument sequence as a whole chooses the best specialisation, it's not the case that each argument expresses a preference. The set of choices is the set S of signatures of specialisations of the function (which is partially ordered), and the 'commitee' is made of one entity only: the function call's signature s. To choose the relevant specialisation, look at: { t \in S | s <= t and \forall u \in S (s < u <= t) \implies u = t } If this set is a singleton { t } then the specialization with signature t is the best fit for s, otherwise there is no best fit. -- Arnaud

4 7

free variables in generator expressions
by Arnaud Delobelle 15 Dec '07

15 Dec '07

In the generator expression (x+1 for x in L) the name 'L' is not local to the expression (as opposed to 'x'), I will call such a name a 'free name', as I am not aware of an existing terminology. The following is about how to treat 'free names' inside generator expressions. I argue that these names should be bound to their values when the generator expression is created, and the rest of this email tries to provide arguments why this may be desirable. I am fully aware that I tend to think about things in a very skewed manner though, so I would be grateful for any rebuttal. Recently I tried to implement a 'sieve' algorithm using generator expressions instead of lists. It wasn't the sieve of Eratosthenes but I will use that as a simpler example (all of the code shown below is python 2.5). Here is one way to implement the algorithm using list comprehensions (tuples could also be used as the mutability of lists is not used): def list_sieve(n): "Generate all primes less than n" P = range(2, n) while True: # Yield the first element of P and sieve out its multiples for p in P: yield p P = [x for x in P if x % p] break else: # If P was empty then we're done return >>> list(list_sieve(20)) [2, 3, 5, 7, 11, 13, 17, 19] So that's ok. Then it occured to me that I didn't need to keep creating all theses lists; so I decided, without further thinking, to switch to generator expressions, as they are supposed to abstract the notion of iterable object (and in the function above I'm only using the 'iterable' properties of lists - any other iterable should do). So I came up with this: def incorrect_gen_sieve(n): "Unsuccessfully attempt to generate all primes less than n" # Change range to xrange P = xrange(2, n) while True: for p in P: yield p # Change list comprehension to generator expression P = (x for x in P if x % p) break else: return >>> list(incorrect_gen_sieve(20)) [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] Ouch. Looking back at the code I realised that this was due to the fact that the names 'p' and 'P' in the generator expressions are not local to it, so subsequent binding of these names will affect the expression. I can't really analyse further than this, my head start spinning if I try :). So I wrapped the generator expression in a lambda function to make sure that the names p and P inside it were bound to what I intended them to: def gen_sieve(n): "Generate all primes less than n, but without tables!" P = xrange(2, n) while True: for p in P: yield p # Make sure that p and P are what they should be P = (lambda P, p: (x for x in P if x % p))(P, p) break else: return >>> list(gen_sieve(20)) [2, 3, 5, 7, 11, 13, 17, 19] That's better. I like the idea of a sieve of Eratosthenes without any tables, and it's achievable in Python quite easily. The only problem is the one that I mentioned above, which boils down to: In a generator expression, free names can be bound to a new object between the time when they are defined and when they are used, thus changing the value of the expression. But I think that the behaviour of generator expressions would be more controllable and closer to that of 'real sequences' if the free names they contain were implicitly frozen when the generator expression is created. So I am proposing that for example: (f(x) for x in L) be equivalent to: (lambda f, L: (f(x) for x in L))(f, L) In most cases this would make generator expressions behave more like list comprehensions. You would be able to read the generator expression and think "that't what it does" more reliabley. Of course if a free name is bound to a mutable object, then there is always the chance that this object will be mutated between the creation of the generator expression and its use. Lastly, instead of using a generator expression I could have written: from itertools import ifilter from functools import partial def tools_sieve(n): "Generate all primes less than n" P = xrange(2, n) while True: for p in P: yield p P = ifilter(partial(int.__rmod__, p), P) break else: return >>> list(sieve(20)) [2, 3, 5, 7, 11, 13, 17, 19] It obviously works as P and p are 'frozen' when ifilter and partial are called. If (f(x) for x in L if g(x)) is to be the moral equivalent of imap(f, ifilter(g, L)) Then my proposal makes even more sense. -- Arnaud

5 8

democratic multiple dispatch and type generality partial orders
by Aaron Watters 14 Dec '07

14 Dec '07

From: "Arnaud Delobelle" <arno(a)marooned.org.uk> > > { t \in S | s <= t and \forall u \in S (s < u <= t) \implies u = t } Sorry if I'm late to this discussion, but the hard part here is defining the partial order <=, it seems to me. I've tried to think about this at times and I keep getting entangled in some highly recursive graph matching scheme. I'm sure the Lisp community has done something with this, anyone have a reference or something? For example: def myFunction(x, y, L): z = x+y L.append(z) return z What is (or should be) the type of myFunction, x, y, and L? Well, let's see. type(x) has an addition function that works with type(y), or maybe type(y) has a radd function that works with type(x)... or I think Python might try something involving coercion....? in any case the result type(z) is acceptible to the append function of type(L)... then I guess myFunction is of type type(x) x type(y) x type(L) --> type(z)... I'm lost. But that's only the start! Then you have to figure out efficient ways of calculating these type annotations and looking up "minimal" matching types. Something tells me that this might be a Turing complete problem -- but that doesn't mean you can't come up with a reasonable and useful weak approximation. Please inform if I'm going over old ground or otherwise am missing something. Thanks, -- Aaron Watters === http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=melting

1 0

Democratic multiple dispatch doomed to fail (probably)
by Neil Toronto 14 Dec '07

14 Dec '07

I apologize if this is well-known, but it's new to me. It may be something to keep in mind when we occasionally dabble in generic functions or multimethods. Hopefully there's a way around it. While kicking around ideas for multiple dispatch, I came across one of Perl's implementations, and it reminded me of something Larry Wall said recently in his State of the Onion address about how types get together and democratically select a function to dispatch to. I immediately thought of Arrow's Theorem. Arrow's Theorem says that you can't (final, period, that's it) make a "social choice function" that's fair. A social choice function takes preferences as inputs and outputs a single preference ordering. (You could think of types as casting preference votes for functions, for example, by stating their "distance" from the exact type the function expects. It doesn't matter exactly how they vote, just that they do it.) "Fair" means the choice function meets these axioms: 1. The output preference has to be a total order. This is important for choosing a president or multiple dispatch, since without it you can't choose a "winner". 2. There has to be more than one agent (type, here) and more than two choices. (IOW, with two choices or one agent it's possible to be fair.) 3. If a new choice appears, it can't affect the final pairwise ordering of two already-existing choices. (Called "independence of irrelevant alternatives". It's one place most voting systems fail. Bush vs. Clinton vs. Perot is a classic example.) 4. An agent promoting a choice can't demote it in the final pairwise ordering. (Called "positive association". It's another place most voting systems fail.) 5. There has to be some way to get any particular pairwise ordering. (Called "citizen's sovereignty". Notice this doesn't say anything about a complete ordering.) 6. No single agent (or type) can control a final pairwise ordering independent of every other agent. (Called "non-dictatorship".) Notice what it doesn't say, since some of these are very loose requirements. In particular, it says nothing about relative strengths of votes. What does this mean for multiple dispatch? It seems to explain why so many different kinds of dispatch algorithms have been invented: they can't be made "fair" by those axioms, which all seem to describe desirable dispatch behaviors. No matter how it's done, either the outcome is intransitive (violates #1), adding a seemingly unrelated function flips the top two in some cases (violates #3), using a more specific type can cause a strange demotion (violates #4), or there's no way to get some specific function as winner (violates #5). Single dispatch gets around this problem by violating #6 - that is, the first type is the dictator. Am I missing something here? I want to be, since multiple dispatch sounds really nice. Neil

1 0

Symlink chain resolver module
by Giampaolo Rodola' 04 Dec '07

04 Dec '07

Hi there, I thought it would have been good sharing this code I used in a project of mine. I thought it could eventually look good incorporated in Python stdlib through os or os.path modules. Otherwise I'm merely offering it as a community service to anyone who might be interested. -- Giampaolo #!/usr/bin/env python # linkchainresolver.py import os, sys, errno def resolvelinkchain(path): """Resolve a chain of symbolic links by returning the absolute path name of the final target. Raise os.error exception in case of circular link. Do not raise exception if the symlink is broken (pointing to a non-existent path). Return a normalized absolutized version of the pathname if it is not a symbolic link. Examples: >>> resolvelinkchain('validlink') /abstarget >>> resolvelinkchain('brokenlink') # resolved anyway /abstarget >>> resolvelinkchain('circularlink') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "module.py", line 19, in resolvelinkchain os.stat(path) OSError: [Errno 40] Too many levels of symbolic links: '3' >>> """ try: os.stat(path) except os.error, err: # do not raise exception in case of broken symlink; # we want to know the final target anyway if err.errno == errno.ENOENT: pass else: raise if not os.path.isabs(path): basedir = os.path.dirname(os.path.abspath(path)) else: basedir = os.path.dirname(path) p = path while os.path.islink(p): p = os.readlink(p) if not os.path.isabs(p): p = os.path.join(basedir, p) basedir = os.path.dirname(p) return os.path.join(basedir, p)

3 5