I've been thinking about some ideas for reducing the
amount of refcount adjustment that needs to be done,
with a view to making GIL removal easier.
1) Permanent objects
In a typical Python program there are many objects
that are created at the beginning and exist for the
life of the program -- classes, functions, literals,
etc. Refcounting these is a waste of effort, since
they're never going to go away.
So perhaps there could be a way of marking such
objects as "permanent" or "immortal". Any refcount
operation on a permanent object would be a no-op,
so no locking would be needed. This would also have
the benefit of eliminating any need to write to the
object's memory at all when it's only being read.
2) Objects owned by a thread
Python code creates and destroys temporary objects
at a high rate -- stack frames, argument tuples,
intermediate results, etc. If the code is executed
by a thread, those objects are rarely if ever seen
outside of that thread. It would be beneficial if
refcount operations on such objects could be carried
out by the thread that created them without locking.
To achieve this, two extra fields could be added
to the object header: an "owning thread id" and a
"local reference count". (The existing refcount
field will be called the "global reference count"
in what follows.)
An object created by a thread has its owning thread
id set to that thread. When adjusting an object's
refcount, if the current thread is the object's owning
thread, the local refcount is updated without locking.
If the object has no owning thread, or belongs to
a different thread, the object is locked and the
global refcount is updated.
The object is considered garbage only when both
refcounts drop to zero. Thus, after a decref, both
refcounts would need to be checked to see if they
are zero. When decrementing the local refcount and
it reaches zero, the global refcount can be checked
without locking, since a zero will never be written
to it until it truly has zero non-local references
remaining.
I suspect that these two strategies together would
eliminate a very large proportion of refcount-related
activities requiring locking, perhaps to the point
where those remaining are infrequent enough to make
GIL removal practical.
--
Greg
I've been working with decorators to make a design by contract
library. This has gotten me thinking about decorators and anonymous
functions.
The two use cases I'll target are callbacks and decorating descriptors. i.e.
(from the twisted tutorial):
class FingerProtocol(basic.LineReceiver):
def lineReceived(self, user):
self.factory.getUser(user
).addErrback(lambda _: "Internal error in server"
).addCallback(lambda m:
(self.transport.write(m+"\r\n"),
self.transport.loseConnection()))
currently this can be modified to (ab)use a decorator:
...
def lineReceived(self, user):
@self.factory.getUser(user).addCallback
def temp(_):
return "Internal error in server"
...
This is an abuse of decorators because it doesn't actually decorate the
function temp, it registers it as a callback for the class FingerProtocol.
And this use of decorators, which I mimic for my design by contract
library. (Guido recently posted a patch on python-dev that would allow
the property to be used in the following way):
class C(object):
x = property(doc="x coordinate")
@x.setter
def x(self, x):
self.__x = x
@x.getter
def x(self):
return self__x
There was some concern voiced on the list about repeating the variable
name both in the decorator and the function definition.
In other languages, these use cases are handled by anonymous functions
or blocks. Python also has an anonymous function, lambda, but it's
limited to a single expression. There are parsing and readability
issues with making lambda multi-line.
My proposal would be to use a special syntax, perhaps a keyword, to
allow decorators to take an anonymous function.
@self.factory.getUser(user).addCallback
def temp(_):
return "Internal error in server"
would become:
@self.factory.getUser(user).addCallback
do (_):
return "Internal error in server"
and:
@x.setter
def x(self, x):
self.__x = x
would become:
@x.setter
do (self, x):
self.__x = x
In general:
@DECORATOR
do (..args...):
..function
would be the same as:
def temp(..args..):
..function
DECORATOR(temp)
del temp
The decorators should be stackable, as well so:
@DEC2
@DEC1
do (..args..)
..function
would be the same as:
def temp(..args..):
..function
DEC2(DEC1(temp))
del temp
Other uses/abuses
Perhaps this new 'do' block could be used to implement things like the
following:
@repeatUntilSuccess(times=10)
do ():
print "Attempting to connect."
socket.connect()
with repeatUntilSuccess implemented as:
def repeatUntilSuccess(times=None):
if times is None:
def repeater(func):
while True:
try:
func()
break
except Exception:
continue
else:
def repeater(func):
for i in range(times):
try:
func()
break
except Exception:
continue
return repeater
@fork
do ():
time.sleep(10)
print "This is done in another thread."
thing.dostuff()
with fork implemented as:
def fork(func):
t = threading.Thread(target=func)
t.start()
Open Issues:
~~~~~~~~~~~~
Doing a google search shows that the word 'do' is occasionally used as
a function/method name. Same for synonyms act and perform. Reusing def
would probably not be a good idea, because it could be easily confused
with a normal decorator and function.
This is still somewhat an abuse of the decorator syntax, it isn't
adding annotations to the anonymous function, so it isn't really
'decorating' it. I do think it's better than using a function that
returns None as a decorator.
While scanning source, the 'do' keyword may be easily confused with
def func. I'm not really sure what would work better, suggestions are
welcome. Perhaps I'm going down the wrong path by reusing decorator
syntax.
Other possible keywords instead of 'do': anonymous, function, act, perform.
I first thought of reusing the keyword lambda, but I think the
semantics of lambda are too different to be used in this case.
Similarly, the keyword with has different semantics, as well.
Thanks,
--
=====
--Ryan E. Freckleton
(Sorry I can't in-reply-to I've lost the original message)
Neil Toronto ntoronto at cs.byu.edu:
> While kicking around ideas for multiple dispatch, I came across one of
> Perl's implementations, and it reminded me of something Larry Wall said
> recently in his State of the Onion address about how types get together
> and democratically select a function to dispatch to. I immediately
> thought of Arrow's Theorem.
Maybe I'm wrong, but I don't think this theorem applies in the case of
mulitple dispatch. The type of the argument sequence as a whole chooses
the best specialisation, it's not the case that each argument expresses a
preference. The set of choices is the set S of signatures of
specialisations of the function (which is partially ordered), and the
'commitee' is made of one entity only: the function call's signature s.
To choose the relevant specialisation, look at:
{ t \in S | s <= t and \forall u \in S (s < u <= t) \implies u = t }
If this set is a singleton { t } then the specialization with signature t
is the best fit for s, otherwise there is no best fit.
--
Arnaud
In the generator expression
(x+1 for x in L)
the name 'L' is not local to the expression (as opposed to 'x'), I
will call such a name a 'free name', as I am not aware of an existing
terminology.
The following is about how to treat 'free names' inside generator
expressions. I argue that these names should be bound to their values
when the generator expression is created, and the rest of this email
tries to provide arguments why this may be desirable.
I am fully aware that I tend to think about things in a very skewed
manner though, so I would be grateful for any rebuttal.
Recently I tried to implement a 'sieve' algorithm using generator
expressions instead of lists. It wasn't the sieve of Eratosthenes but
I will use that as a simpler example (all of the code shown below is
python 2.5). Here is one way to implement the algorithm using list
comprehensions (tuples could also be used as the mutability of lists
is not used):
def list_sieve(n):
"Generate all primes less than n"
P = range(2, n)
while True:
# Yield the first element of P and sieve out its multiples
for p in P:
yield p
P = [x for x in P if x % p]
break
else:
# If P was empty then we're done
return
>>> list(list_sieve(20))
[2, 3, 5, 7, 11, 13, 17, 19]
So that's ok. Then it occured to me that I didn't need to keep
creating all theses lists; so I decided, without further thinking, to
switch to generator expressions, as they are supposed to abstract the
notion of iterable object (and in the function above I'm only using
the 'iterable' properties of lists - any other iterable should do).
So I came up with this:
def incorrect_gen_sieve(n):
"Unsuccessfully attempt to generate all primes less than n"
# Change range to xrange
P = xrange(2, n)
while True:
for p in P:
yield p
# Change list comprehension to generator expression
P = (x for x in P if x % p)
break
else:
return
>>> list(incorrect_gen_sieve(20))
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Ouch. Looking back at the code I realised that this was due to the
fact that the names 'p' and 'P' in the generator expressions are not
local to it, so subsequent binding of these names will affect the
expression. I can't really analyse further than this, my head start
spinning if I try :). So I wrapped the generator expression in a
lambda function to make sure that the names p and P inside it were
bound to what I intended them to:
def gen_sieve(n):
"Generate all primes less than n, but without tables!"
P = xrange(2, n)
while True:
for p in P:
yield p
# Make sure that p and P are what they should be
P = (lambda P, p: (x for x in P if x % p))(P, p)
break
else:
return
>>> list(gen_sieve(20))
[2, 3, 5, 7, 11, 13, 17, 19]
That's better. I like the idea of a sieve of Eratosthenes without any
tables, and it's achievable in Python quite easily. The only problem
is the one that I mentioned above, which boils down to:
In a generator expression, free names can be bound to a new object
between the time when they are defined and when they are used,
thus changing the value of the expression.
But I think that the behaviour of generator expressions would be more
controllable and closer to that of 'real sequences' if the free names
they contain were implicitly frozen when the generator expression is
created.
So I am proposing that for example:
(f(x) for x in L)
be equivalent to:
(lambda f, L: (f(x) for x in L))(f, L)
In most cases this would make generator expressions behave more like
list comprehensions. You would be able to read the generator
expression and think "that't what it does" more reliabley. Of course
if a free name is bound to a mutable object, then there is always the
chance that this object will be mutated between the creation of the
generator expression and its use.
Lastly, instead of using a generator expression I could have
written:
from itertools import ifilter
from functools import partial
def tools_sieve(n):
"Generate all primes less than n"
P = xrange(2, n)
while True:
for p in P:
yield p
P = ifilter(partial(int.__rmod__, p), P)
break
else:
return
>>> list(sieve(20))
[2, 3, 5, 7, 11, 13, 17, 19]
It obviously works as P and p are 'frozen' when ifilter and partial
are called. If
(f(x) for x in L if g(x))
is to be the moral equivalent of
imap(f, ifilter(g, L))
Then my proposal makes even more sense.
--
Arnaud
From: "Arnaud Delobelle" <arno(a)marooned.org.uk>
>
> { t \in S | s <= t and \forall u \in S (s < u <= t) \implies u = t }
Sorry if I'm late to this discussion, but the hard part here
is defining the partial order <=, it seems to me.
I've tried to think about this at times and I keep getting
entangled in some highly recursive graph matching scheme.
I'm sure the Lisp community has done something with this,
anyone have a reference or something?
For example:
def myFunction(x, y, L):
z = x+y
L.append(z)
return z
What is (or should be) the type of myFunction, x, y, and L?
Well, let's see. type(x) has an addition function that works
with type(y), or maybe type(y) has a radd function that works
with type(x)... or I think Python might try something
involving coercion....? in any case the result type(z) is acceptible to
the append function of type(L)... then I guess myFunction is of type
type(x) x type(y) x type(L) --> type(z)...
I'm lost.
But that's only the start! Then you have to figure out efficient
ways of calculating these type annotations and looking up "minimal"
matching types.
Something tells me that this might be a Turing complete
problem -- but that doesn't mean you can't come up with
a reasonable and useful weak approximation. Please inform
if I'm going over old ground or otherwise am missing something.
Thanks, -- Aaron Watters
===
http://www.xfeedme.com/nucular/pydistro.py/go?FREETEXT=melting
I apologize if this is well-known, but it's new to me. It may be
something to keep in mind when we occasionally dabble in generic
functions or multimethods. Hopefully there's a way around it.
While kicking around ideas for multiple dispatch, I came across one of
Perl's implementations, and it reminded me of something Larry Wall said
recently in his State of the Onion address about how types get together
and democratically select a function to dispatch to. I immediately
thought of Arrow's Theorem.
Arrow's Theorem says that you can't (final, period, that's it) make a
"social choice function" that's fair. A social choice function takes
preferences as inputs and outputs a single preference ordering. (You
could think of types as casting preference votes for functions, for
example, by stating their "distance" from the exact type the function
expects. It doesn't matter exactly how they vote, just that they do it.)
"Fair" means the choice function meets these axioms:
1. The output preference has to be a total order. This is important for
choosing a president or multiple dispatch, since without it you can't
choose a "winner".
2. There has to be more than one agent (type, here) and more than two
choices. (IOW, with two choices or one agent it's possible to be fair.)
3. If a new choice appears, it can't affect the final pairwise ordering
of two already-existing choices. (Called "independence of irrelevant
alternatives". It's one place most voting systems fail. Bush vs. Clinton
vs. Perot is a classic example.)
4. An agent promoting a choice can't demote it in the final pairwise
ordering. (Called "positive association". It's another place most voting
systems fail.)
5. There has to be some way to get any particular pairwise ordering.
(Called "citizen's sovereignty". Notice this doesn't say anything about
a complete ordering.)
6. No single agent (or type) can control a final pairwise ordering
independent of every other agent. (Called "non-dictatorship".)
Notice what it doesn't say, since some of these are very loose
requirements. In particular, it says nothing about relative strengths of
votes.
What does this mean for multiple dispatch?
It seems to explain why so many different kinds of dispatch algorithms
have been invented: they can't be made "fair" by those axioms, which all
seem to describe desirable dispatch behaviors. No matter how it's done,
either the outcome is intransitive (violates #1), adding a seemingly
unrelated function flips the top two in some cases (violates #3), using
a more specific type can cause a strange demotion (violates #4), or
there's no way to get some specific function as winner (violates #5).
Single dispatch gets around this problem by violating #6 - that is, the
first type is the dictator.
Am I missing something here? I want to be, since multiple dispatch
sounds really nice.
Neil
Hi there,
I thought it would have been good sharing this code I used in a
project of mine.
I thought it could eventually look good incorporated in Python stdlib
through os or os.path modules.
Otherwise I'm merely offering it as a community service to anyone who
might be interested.
-- Giampaolo
#!/usr/bin/env python
# linkchainresolver.py
import os, sys, errno
def resolvelinkchain(path):
"""Resolve a chain of symbolic links by returning the absolute
path
name of the final target.
Raise os.error exception in case of circular link.
Do not raise exception if the symlink is broken (pointing to a
non-existent path).
Return a normalized absolutized version of the pathname if it is
not a symbolic link.
Examples:
>>> resolvelinkchain('validlink')
/abstarget
>>> resolvelinkchain('brokenlink') # resolved anyway
/abstarget
>>> resolvelinkchain('circularlink')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "module.py", line 19, in resolvelinkchain
os.stat(path)
OSError: [Errno 40] Too many levels of symbolic links: '3'
>>>
"""
try:
os.stat(path)
except os.error, err:
# do not raise exception in case of broken symlink;
# we want to know the final target anyway
if err.errno == errno.ENOENT:
pass
else:
raise
if not os.path.isabs(path):
basedir = os.path.dirname(os.path.abspath(path))
else:
basedir = os.path.dirname(path)
p = path
while os.path.islink(p):
p = os.readlink(p)
if not os.path.isabs(p):
p = os.path.join(basedir, p)
basedir = os.path.dirname(p)
return os.path.join(basedir, p)