I've been thinking about some ideas for reducing the
amount of refcount adjustment that needs to be done,
with a view to making GIL removal easier.
1) Permanent objects
In a typical Python program there are many objects
that are created at the beginning and exist for the
life of the program -- classes, functions, literals,
etc. Refcounting these is a waste of effort, since
they're never going to go away.
So perhaps there could be a way of marking such
objects as "permanent" or "immortal". Any refcount
operation on a permanent object would be a no-op,
so no locking would be needed. This would also have
the benefit of eliminating any need to write to the
object's memory at all when it's only being read.
2) Objects owned by a thread
Python code creates and destroys temporary objects
at a high rate -- stack frames, argument tuples,
intermediate results, etc. If the code is executed
by a thread, those objects are rarely if ever seen
outside of that thread. It would be beneficial if
refcount operations on such objects could be carried
out by the thread that created them without locking.
To achieve this, two extra fields could be added
to the object header: an "owning thread id" and a
"local reference count". (The existing refcount
field will be called the "global reference count"
in what follows.)
An object created by a thread has its owning thread
id set to that thread. When adjusting an object's
refcount, if the current thread is the object's owning
thread, the local refcount is updated without locking.
If the object has no owning thread, or belongs to
a different thread, the object is locked and the
global refcount is updated.
The object is considered garbage only when both
refcounts drop to zero. Thus, after a decref, both
refcounts would need to be checked to see if they
are zero. When decrementing the local refcount and
it reaches zero, the global refcount can be checked
without locking, since a zero will never be written
to it until it truly has zero non-local references
remaining.
I suspect that these two strategies together would
eliminate a very large proportion of refcount-related
activities requiring locking, perhaps to the point
where those remaining are infrequent enough to make
GIL removal practical.
--
Greg
This is a proposal to add a new built-in named struct:
struct(**kwargs)
Return a struct object which has the attributes given in kwargs.
The name is really unimportant, and I'm open to other ideas, but I
feel that many people have a good idea of what a struct is, and
accessing an object returned from struct would look identical to the
access of a struct in C, so it seems appropriate to me.
The rationale:
It is often helpful to package related information together in order
to make passing the information to various functions more convenient.
The easiest ways (currently) to package such information is to put it
into a tuple (the easiest) or a dict.
Putting the information into a tuple may be easy initially, but it has
high costs later on as it becomes hard to remember what order the
information is in, and to someone reading the code, the intent of the
code is far from clear.
Putting the information in a dict is a bit harder than a tuple, but
since it is more readable later on than a tuple, this method is often
used. Still, the access pattern is more cumbersome than it could be;
foo["bar"] is more cumbersome than, say, foo.bar. This is especially
the case if you have a dict or list of foos, where you then have to
use foos[i]["bar"].
Both tuple and dict solutions suffer down the line when the
information gets to be complicated enough to warrant a class of its
own. It involves changing both the spot in the code where the
information is created (to use the new class constructor), as well as
changing every single field access in the code (changing every foo[0]
or foo["bar"] to foo.bar).
An alternative is to use NamedTuple, NamedDict, NamedList, or to
create your own class. As long as these are more complicated to use
than a tuple or a dict, however, they are not likely to be used for
this purpose. Another problem is that all of these methods require you
to go to the trouble of thinking of a name for your class, and if you
later decide to add more information to your packaged object, you have
to make two changes (in the list of attributes / constructor and in
the place where you instantiate your object).
Enter struct. Using struct is intended to be just as easy as using a
dict (actually, easier when the number of fields is more than two or
three), and not much harder than using a tuple. To declare a struct
foo with attribute bar, you simply use:
foo = struct(bar="barvalue")
Access becomes very easy and readable:
foo.bar
Adding new fields is as easy as changing the initial instantiation, in
one place:
foo = struct(bar="barvalue", baz="bazvalue")
Later on down the line, when you decide that you are doing more with
foo than a struct should be doing, you can easily define a class Foo
which inherits from struct, and since accesses to foo already look
like foo.bar, you only have one spot in the code to change:
foo = Foo(bar="barvalue", baz="bazvalue")
and the rest "just works" with no changes.
The implementation:
class struct (object):
def __init__ (self, **kwargs):
self.__dict__.update(kwargs)
def __repr__ (self):
"""
Using self.__class__.__name__ allows classes to inherit from
struct and automatically have a nice __repr__ method.
"""
return "%s(%s)" % (self.__class__.__name__,
", ".join("%s=%s" % (attr, repr(val))
for attr, val
in self.__dict__.iteritems()),)
# or .items()
# in Python 3K
def __str__ (self):
return self.__repr__()
def __eq__ (self, other):
"""
Implements comparison operation mirroring that of a C struct.
"""
return self.__dict__ == other.__dict__
def __ne__ (self, other):
"""
See note for __eq__.
"""
return not self.__eq__(other)
def __setattr__ (self, name, value):
"""
I think it makes the most sense for a struct to have immutable
fields. As soon as you start to add more fields, you should be
using something other than a struct.
"""
if name in self.__dict__:
self.__dict__[name] = value
else:
raise(AttributeError("'struct' object has no attribute '%s'" \
% (name,)))
def __len__ (self):
"""
I'm not sure that it's really necessary to include this, but I
could see where it might be helpful in some instances.
"""
return len(self.__dict__)
def __iter__ (self):
"""
See note for __len__
"""
return self.__dict__.itervalues()
# or .values() in Python 3K
Sample usage:
>>> a = struct(one=1, two=2)
>>> a
struct(two=2, one=1)
>>> a.one
1
>>> a.two
2
>>> a.three
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'struct' object has no attribute 'three'
>>> a.one = "one"
>>> a.one
'one'
>>> a.three = 3
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "struct.py", line 39, in __setattr__
% (name,)))
AttributeError: 'struct' object has no attribute 'three'
>>> b = struct(one=1, two=2)
>>> b
struct(two=2, one=1)
>>> a == b
False
>>> a.one = 1
>>> a == b
True
>>> len(a)
2
>>> print ", ".join(str(v) for v in a)
2, 1
>>> 1 in a
True
>>> "one" in a
False
Ideas or feedback, anyone?
This is based off of http://boredzo.org/blog/archives/2007-10-22/generating-all-combinations
. It's also discussed at http://reddit.com/info/5ywrs/comments/ and
someone with a lot of spare time can read Knuth's fascile on the
subject at http://www-cs-faculty.stanford.edu/~knuth/fasc2a.ps.gz .
OK, suppose you have a situation where you need to loop through all
combinations of 3 lists. The simple solution is to write three nested
for-loops:
for x in xs:
for y in ys:
for z in zs:
#Do something
Of course, this is obvious O(n^3), which is bad, but sometimes you
don't have a choice. However, what if (to use an example I ran into),
you're making up a truth table. If you know you have three variables,
you might write:
for a in [True, False]:
for b in [True, False]:
for c in [True, False]:
print a, b, c
Which yields
True True True
True True False
True False True
True False False
False True True
False True False
False False True
False False False
But if you don't know how many variables you'll have, you're stuck
writing a complicated function instead of using a nice, simple for-loop.
So, going back to the first example, wouldn't it be nicer to write:
for x, y, z in combine(xs, ys, zs):
#Do something
If nothing else, this has the advantage that if you don't have to nest
the for-loops, things don't end up being so deeply indented on the
screen.
Similarly, a general truth table maker can be written:
ts_and_fs = [[True, False]] * num_vars
for variables in combine(*ts_and_fs):
print variables
So, I think a "combine" function (or some other, better name) would be
a good thing to have, at least in the itertools, if not as a built-in
function. How to implement it? Well, probably it should be done in C
for efficiency, but of the versions done http://boredzo.org/blog/archives/2007-10-22/generating-all-combinations
the one that I like best is
def combine(*sequences):
#Test to guard against infinite recursion
if sequences:
def _inner(*seqs):
if len(seqs) == 1:
for i in seqs[0]: yield (i, )
else:
for rest in combine(*seqs[:-1]):
for i in seqs[-1]:
yield rest + (i, )
return _inner(*sequences)
else:
#Empty generator that yields StopIteration
def _inner():
return
yield
return _inner()
However, I'm also convinced that there is a more efficient way to do
this.
So, what do you think? Is this a common enough need that it should be
built into itertools? The main namespace? Or should we leave it out,
since adding it would encourage people writing O(n^x) algorithms? If
nothing else, list members can enjoy rewriting this function for fun.
-- Carl Johnson
I'd like to propose a signature simplification and a new method for
the Queue.Queue class:
1) Drop the optional `block` argument from put() and get(), since all
the meaningful combinations of (block, timeout) are equivalent to
passing block = (timeout is None or timeout>0). IOW, instead of
passing block=False pass timeout=0 (or negative). Obviously this is to
be considered for 3.x only.
2) Add a new `rotate` method as the atomic equivalent of
put_nowait(get_nowait()). Currently I use the following subclass but
it would be nice to have it in the base Queue:
class RQueue(Queue):
def rotate(self, n=1):
'''Rotate this queue n steps to the left (if it is not empty).
Rotating one step is equivalent to an atomic
q.put_nowait(q.get_nowait()).
'''
if n < 0:
raise ValueError('n must be non-negative')
self.mutex.acquire()
try:
if not self._empty():
self._rotate(n)
finally:
self.mutex.release()
def __init__(self, maxsize=0):
Queue.__init__(self, maxsize)
if hasattr(self.queue, 'rotate'): # deque has rotate() since v2.5
def _rotate(n):
# negative n for left rotate
self.queue.rotate(-n)
else:
def _rotate(n):
put,get = self._put, self._get
for i in xrange(n):
put(get())
self._rotate = _rotate
What do you think ?
George
On Thu, May 22, 2008 at 6:25 PM, Matthew Russell
<matt.horizon5(a)gmail.com> wrote:
> How about:
> collections.attrs(**kwargs)
> or
> collections.record(**kwargs)
I've used the name "Record" in the past, but when I was trying to
figure out if Python had anything like a C struct, I searched
specifically for "python struct". I'm not sure how many others do the
same, but perhaps using a name like "structure" would be better than
"record"?
(On a side note, see what I mean about the work involved in naming a class? :-)
> I think I prefer the idea of this being a factory that returns an anonymous
> object as opposed to a class:
>
> def record(**kwargs):
> class Record(object):
> def __init__(self, **kw):
> for (k, v) in kwargs.items():
> setattr(self, k, v)
> return Record()
I'm guessing that the last line was meant to be "return Record(**kwargs)"
> This is mainly because a class you later substitude for the
> result of record(**kwargs) may later end up wanting to take kwargs as
> optional values which you might not want as instance variables, or even
> posistional arguements.
>
> x = MyThing(x=1,y=2, default_spam=False)
>
>
> So you could easilty do:
>
> x = MyThing(default_spam=False, **record_obj.__dict__)
>
> Matt
I can see where you're coming from, but this change would mean that
you would no longer be able to inherit from record (or structure, or
whatever). Also, if you're making that many changes to the class, it
wouldn't be too hard to rewrite the initial instantiation to not use
positional parameters with or without kwargs (after rewriting the
constructor, which you would clearly be doing, anyways). Another
caveat is that each call to record() will create an object of a
different class. Thus, if you have the following:
employees = {}
for ssn, fn, ln in employee_data_list:
employees[ssn] = record(ssn=ssn, first=fn, last=ln)
Then employees[ssn1].__class__ != employees[ssn2].__class__. This
seems like it could be an issue at some point.
Brandon
I would like to propose to change the built-in function "object" to
have the following syntax:
object(**kwargs)
Return a new featureless object. object is a base for all new style
classes. It has the methods that are common to all instances of new
style classes.
If kwargs is given, the returned object's __dict__ will be kwargs (or
something to that effect).
An example:
a = object(one=1, two=2)
a.one # returns 1
a.two # returns 2
The justification:
On several occasions I have needed a collection of attributes to
represent a single item. When this happens, there are really three
options:
1. Use a tuple. This works well for temporarily re-packaging items in
a loop or for quick-and-dirty code, but it rapidly becomes unreadable
and hard to maintain. It is not long before you forget what order the
attributes are in, and at first glance, it is not clear what kind of
object is being indexed.
2. Use a dict. This is an improvement over tuples on readability, but
they can be a pain to build and overly-cumbersome to access later. I
understand that dicts are used all over the place in Python, but I
still think of them (in the general case) as a map of keys to values
where the dict represents a collection, not an object.
3. Use a class. This requires coming up with a name for the class and
then writing the class (admittedly, this should be easy). Afterwards,
this is the most convenient, readable method for representing the
data, but since it requires non-trivial effort up front, this method
may be avoided until it's truly apparent that it is necessary.
A real-world example:
Let's say that I want to have a map of employee SSNs to employee data.
I am going to be manipulating this information in various ways, but
not really in any ways that justify the use of class methods. At any
rate, let's build this data from a file where each line is
SSN First Last Salary
with the items being whitespace-delimited. The code, then, will be:
employees = {}
for ssn, first, last, salary in (line.split() for line in open(employee_file)):
employees[ssn] = (ssn, first, last, salary) # tuple method
employees[ssn] = {"ssn": ssn, "first": first, "last": last,
"salary": salary} # dict method
employees[ssn] = Employee(ssn, first, last, salary) # assumes
class with proper constructor
# first name of employee with SSN
employees[SSN][1] # tuple method -- quite unreadable
employees[SSN]["first"] # dict method -- more readable but sub-optimal
employees[SSN].first # class method -- readable and simple
Now, because of the advantages the class method offers in terms of
readability, I have written a convenience class that makes using it
easier:
class Record:
"""
A class intended to provide a simple interface for creating objects
with named fields. That is, instead of returning a tuple and indexing
it or writing a unique class, you can simply do something like:
a = Record(x=1, y=2)
a.x # 1
a.y # 2
a # Record(x=1, y=2)
"""
def __init__ (self, **kwargs):
self.__dict__.update(kwargs)
def __repr__ (self):
return "Record(%s)" \
% ", ".join("%s=%s" \
% field for field in self.__dict__.iteritems())
Now, the line in the for loop above becomes
employees[ssn] = Record(ssn=ssn, first=first, last=last, salary=salary)
and I have completely avoided the need to define a unique class. Note
that as the number of fields increases, this becomes easier to use
inline than the dict method, at the same time that it avoids the
upfront costs of having to build a new class for every distinct type
of object in the program.
It turns out that other people find such a method useful as well.
According to Catherine Devlin on the centralOH Python list, it is
recipe 4.18 from the Python Cookbook (2nd ed.). Several others from
the mailing list stated that they had created similar solutions
themselves.
Thus, my suggestion is to simply build such functionality directly
into the language. While scanning the built-in functions page (I don't
use any, all, or enumerate nearly enough), I noticed the object()
function, and since its purpose is to return a featureless object, it
seems to fit the bill quite well. Adding my suggested functionality
should break nothing (since the kwargs would be optional) and would
allow people to stop baking their own solutions to this common
problem, while getting the speed bonus of a C implementation (if it
helps).
Thus, the code above becomes
for...
employees[ssn] = object(ssn=ssn, first=first, last=last, salary=salary)
employees[SSN].first
Does anyone else think this might be a good idea for Python 2.6/3K?
Thanks,
Brandon
I have 2 related proposals:
1. Give generators a .__name__ attribute that is the same as their curent
(3.0a5) .gi_code.co_name subattribute. just as funct.__name__ is
func.__code__.co_name.
My reason is, I expect, much the same as that for func.__name__. I am
using the generator name (for bad-iterator-output messages in a test
function) and would prefer to get it through a cross-implementation
'public' interface' rather than a cPython internal implementation detail
(which I understand code object to be). I am otherwise trying to avoid
using cPython internals.
(Is there any plan to change the gi_* attributes the way the func_*
attributes were?)
2. Whether or not 1 is adopted, add the name to the representation:
<gfuncname generator object as..> or <generator object gfuncname at ..>
Conceptually, I see a generator function as an abbreviated version of a
iterator class, with most of the boilerplate removed, that defines a
subclass of the generator class. So I think the subclass name should be
part of its representation.
Terry Jan Reedy
Should decimal be the default for floating period literals?
E.g.
1.2 would actually be decimal.Decimal("1.2")
and float(1.2) would be used to get traditional binary float point.