I've been thinking about some ideas for reducing the
amount of refcount adjustment that needs to be done,
with a view to making GIL removal easier.
1) Permanent objects
In a typical Python program there are many objects
that are created at the beginning and exist for the
life of the program -- classes, functions, literals,
etc. Refcounting these is a waste of effort, since
they're never going to go away.
So perhaps there could be a way of marking such
objects as "permanent" or "immortal". Any refcount
operation on a permanent object would be a no-op,
so no locking would be needed. This would also have
the benefit of eliminating any need to write to the
object's memory at all when it's only being read.
2) Objects owned by a thread
Python code creates and destroys temporary objects
at a high rate -- stack frames, argument tuples,
intermediate results, etc. If the code is executed
by a thread, those objects are rarely if ever seen
outside of that thread. It would be beneficial if
refcount operations on such objects could be carried
out by the thread that created them without locking.
To achieve this, two extra fields could be added
to the object header: an "owning thread id" and a
"local reference count". (The existing refcount
field will be called the "global reference count"
in what follows.)
An object created by a thread has its owning thread
id set to that thread. When adjusting an object's
refcount, if the current thread is the object's owning
thread, the local refcount is updated without locking.
If the object has no owning thread, or belongs to
a different thread, the object is locked and the
global refcount is updated.
The object is considered garbage only when both
refcounts drop to zero. Thus, after a decref, both
refcounts would need to be checked to see if they
are zero. When decrementing the local refcount and
it reaches zero, the global refcount can be checked
without locking, since a zero will never be written
to it until it truly has zero non-local references
remaining.
I suspect that these two strategies together would
eliminate a very large proportion of refcount-related
activities requiring locking, perhaps to the point
where those remaining are infrequent enough to make
GIL removal practical.
--
Greg
Here is a nice little Python 3 program, test.py:
import string
buffer = string.ascii_letters
bytes = []
sum = 0
for chr in buffer:
int = ord(chr)
if 32 <= int < 127:
bytes.append(chr)
sum += 1
str = "".join(bytes)
print(sum, str)
If run as:
python30a -W all test.py
It produces the expected output:
52 abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
But unfortunately it uses as identifiers: buffer, bytes, chr, int, sum,
and str. None of these are keywords so none of them provokes a
SyntaxError. In fact there are over 130 such identifiers;
print(dir(__builtins__)) to see them.
I think many newcomers to Python will find it difficult to remember 160
identifiers (keywords + __builtins__) and since some of them have
appealing names (esp. buffer, bytes, min, max, and sum), they may make
use of them without realising that this could cause them problems later
on.
My python-idea is that if python is run with -W all then it should
report uses of __builtins__ as identifiers.
--
Mark Summerfield, Qtrac Ltd., www.qtrac.eu
I'd post this on Python-dev, but it has more to do with the future of
Python, and it directly impacts the fairly-well-received Python-idea I'm
working on right now.
The current behavior has persisted since revision 9877, nine years ago:
http://svn.python.org/view?rev=9877&view=rev
"Vladimir Marangozov' performance hack: copy f_builtins from ancestor
if the globals are the same."
A variant of the behavior has persisted since the age of the dinosaurs,
as far as I can tell - or at least ever since Python had stack frames.
Here's how the globals/builtins lookup is currently presented as working:
1. If 'name' is in globals, return globals['name']
2. Return globals['__builtins__']['name']
Glossing over a lot of details, here's how it *actually* worked before
the performance hack:
0. A code object gets executed, which creates a stack frame. It
sets frame.builtins = globals['__builtins__'].
While executing the code:
1. If 'name' is in globals, return globals['name'].
2. Otherwise return frame.builtins['name'].
A problem example, which is still a problem today:
__builtins__ = {'len': lambda x: 1}
print len([1, 2, 3])
# prints:
# '3' when run as a script
# '1' in interactive mode
If running as a script or part of an import, the module's frame caches
builtins, so it doesn't matter that it gets reassigned. When 'len' is
looked up for the print statement, it's looked up in the cached version.
But in interactive mode, each statement is executed in its own frame, so
it doesn't have this problem.
Well, at least module *functions* will run in their own frames, so
they'll see the new builtins, right? But here's how it works now, after
the performance hack:
0. A code object gets executed, which creates a stack frame.
a. If the stack frame has a parent (think "call site") and
the parent has the same globals, it sets
frame.builtins = parent.builtins.
b. Otherwise it sets frame.builtins = globals['__builtins__'].
While executing the code:
1. If 'name' is in globals, return globals['name'].
2. Otherwise return frame.builtins['name'].
A problem example:
__builtins__ = {'len': lambda x: 1}
def f(): print len([1, 2, 3])
f()
# prints:
# '3' when run as a script
# '1' in interactive mode
At the call site "f()", frame.builtins is the original, cached builtins.
Before the hack, f()'s frame would have recalculated and re-cached it.
After the hack, f()'s frame inherits the cached version. But this only
happens in a script, which runs its code in a single frame. If you try
this in interactive mode, you'll get correct behavior.
If function calls stay within a module, builtins is effectively frozen
at the value it had when the module started execution. But if outside
modules call those same functions, builtins will have its new value!
That could be bad:
import my_extra_special_builtins as __builtins__
<define extra-special library functions that use new builtins>
def run_tests_on_extra_special_functions():
<tests, etc.>
if __name__ == '__main__':
run_tests_on_extra_special_functions()
The special library functions work, but the tests don't. The special
builtins module only shows up when functions are called from outside
modules (where the call sites have different globals) and the functions'
frames are forced to recalculate builtins rather than inheriting it.
Here are some ways around the problem:
1. Put all the tests in a different module.
2. Use a unit testing framework, which will call the module
functions from outside the module.
3. Call functions using exec with custom globals.
4. Replace functions using types.FunctionType with custom globals.
#3 and #4 are decidedly unlikely. :) #1 is generally discouraged (AFAIK)
if not annoying, and #2 is encouraged.
In the last thread on __builtins__ vs. __builtin__, back in March, it
seemed that Guido was open to new ideas for Python 3.0 on the subject.
Well, keeping in mind this strange behavior and the length of time it's
gone on, here's my recommendation:
Kill __builtins__. Take it out of the module dict. Let LOAD_GLOBAL
look in "builtins" (currently "__builtin__") for names after it
checks globals. If modules want to hack at builtins, they can
import it. But they hack it globally or not at all.
I honestly can't think of a use case you can handle by replacing a
module's __builtins__ that can't be handled without. If there is one,
nobody actually does it, because we would have heard them screaming in
agony and banging their heads against the walls from thousands of miles
away by now. You just can't do it reliably as of February 1998.
The regression test suite doesn't even touch things like this. It only
goes as far as injecting stuff into __builtin__.
Finally, on to my practical problem.
I'm working on the fast globals stuff, which is how I got onto this
subject in the first place. Here are a few of my options:
1. I can make __builtins__ work like it was always supposed to, at
the cost of decreased performance and extra complexity. It would
still be much faster than it is now, though.
2. Status quo: I can make __builtins__ work like it does now. I
think I can do this, anyway. It's actually more complex than #1,
and very likely slower. I would rather not take this route.
3. For a given function, I can freeze __builtins__ at the value it
was at when the function was defined.
4. I can make it work like I suggested for Python 3.0, but make
__builtin__ automatically available to modules as __builtins__.
With or without it, I should be posting my patch for fast globals soon.
No, don't look at me like that. I'm serious!
Wondering-what-to-do-ly,
Neil
gaah ... this should have been sent to the list for archiving.
The summary is that my memory was wrong, and items are *not* jostled
back to "better" locations.
-jJ
I have a hack coded up against r59068 in which LOAD_GLOBAL is even
faster than LOAD_FAST. It'll be the same with STORE_GLOBAL and the
*_NAME opcodes after I'm done with it, and it should be fully
transparent to Python code. (That is, you can go ahead and swap out
__builtins__ and crazy junk like that and everything should work as it
did before.) Regression tests all pass, except test_gc on functions -
I've got a refcount bug somewhere.
Here's the microbenchmark I've been using to test LOAD_GLOBAL and LOAD_FAST:
import timeit
import dis
def test_local_get():
x = 0
x; x; x; #... and 397 more of them
if __name__ == '__main__':
print dis.dis(test_local_get.func_code)
print timeit.Timer('test_local_get()',
'from locals_test import test_local_get').timeit()
The globals test puts 'x' in module scope, and the builtins test changes
'x' to 'len' and doesn't assign it to 0.
Output right now:
r59068 locals: 15.57 sec
myhack locals: 15.61 sec (increase is probably insignificant or random)
r59068 globals: 23.61 sec
myhack globals: 15.14 sec (!)
r59068 builtins: 28.08 sec
myhack builtins: 15.26 sec (!!)
Of course, it's no good if it slows everything else way the heck down.
So 10 rounds of pybench says:
r59068: mean 8.92, std 0.05
myhack: mean 8.99, std 0.04
From what I see in pybench, globals access is severely underrepresented
compared to real programs, so those numbers aren't representative of the
possible difference in real-life performance.
Jim Jewett gave me the idea here:
http://mail.python.org/pipermail/python-ideas/2007-November/001207.html
"Note that weakening the module.__dict__ promise to only meeting the
dict API would make it easier to implement the various speed-up-globals
suggestions."
I didn't exactly do that, but it did get me thinking. The other
proposals for speeding up globals access seemed to do their darndest to
leave PyDictObject alone and ended up hideously complicated because of
it. Here's the main idea for this one: What if a frame could maintain an
array of pointers right into a dictionary's entry table? A global lookup
would then consist of a couple of pointer dereferences, and any value
change would show up immediately to the frame.
There was a dangerous dangling pointer problem inherent in that, so I
formalized an update mechanism using an observer pattern.
Here's how it works. Arbitrary objects can register themselves with a
dictionary as "entry observers". The dictionary keeps track of all the
registered observers, and for certain events, makes a call to each one
to tell them that something has changed. The entry observers get
pointers to entries via PyDict_GetEntry, which is just like
PyDict_GetItem, except it returns a PyDictEntry * right from the
dictionary's entry table.
The dict notifies its observers on delitem, pop, popitem, resize and
clear. Nothing else is necessary - nothing else will change the address
of or invalidate an entry. There are very, very few changes in
PyDictObject. In the general case, the pointer to the list of observers
is NULL, and the only additional slowdown is when delitem, pop, popitem,
resize and clear check that and move on - but those aren't called often.
So get, set, iter, contains, etc., are all exactly as fast as they were
before. The biggest performance hit is when a highly-observed dict like
__builtin__.__dict__ resizes, but that's rare enough to not worry about.
To speed up globals access, an auxiliary object to functions and frames
registers itself as an observer to func_globals and __builtins__. It
makes an array of PyDictEntry pointers corresponding to
func_code.co_names. PyEval_EvalFrameEx indexes that array first for
global values, and updates it if there's one it couldn't find when the
function was created.
That's pretty much it. There are corner cases I still have to address,
like what happens if someone replaces or deletes __builtins__, but it
should be fairly easy to monitor that.
I'd love to hear your comments, everyone. I've glossed over a lot of
implementation details, but I've tried to make the main ideas clear.
Neil
Hi to all,
I would find very useful having a version of os.listdir returning a
generator.
If a directory has many files, say 20,000, it could take a long time
getting all of them with os.listdir and this could be a problem in
asynchronous environments (e.g. asynchronous servers).
The only solution which comes to my mind in such case is using a
thread/fork or having a non-blocking version of listdir() returning an
iterator.
What do you think about that?
(Disclaimer: I have no issue with "self." and "super." attribute access,
which is what most people think of when they think "implicit self".)
While showing a coworker a bytecode hack I made this weekend - it allows
insertion of arbitrary function parameters into an already-existing
function - he asked for a use case. I gave him this:
class A(object):
# ...
def method(x, y):
self.x = x
super.method(y)
where 'method' is replaced by this method wrapper via metaclass or
decorator:
def method_wrapper(self, *args, **kwargs):
return hacked_method(self, super(cls, self), *args, **kwargs)
These hackish details aren't important, the resulting "A.method" is.
It occurred to me that explicit self and implicit super is semantically
inconsistent. Here's Python 3000's version of the above (please compare):
class A(object):
def method(self, x, y):
self.x = x
super.method(y)
Why have a magic "super" local but not a magic "self" local? From a
*general usage* standpoint, the only reason I can think of (which is not
necessarily the only one, which is why I'm asking) is that a person
might want to change the name of "self", like so:
class AddLike(object):
# ...
def __add__(a, b):
# return something
def __radd__(b, a):
# return something
But reverse binary special methods are the only case where it's not
extremely bad form. Okay, two reasons for explicit self: backward
compatibility, but 2to3 would make it a non-issue.
From an *implementation standpoint*, making self implicit - a cell
variable like super, for example - would wreak havoc with the current
bound/unbound method distinction, but I'm not so sure that's a bad thing.
Neil
I set out trying to redo the 3.0 autosuper metaclass in 2.5 without
bytecode hacking and ran into a problem: a function's func_globals isn't
polymorphic. That is, the interpreter uses PyDict_* calls to access it,
and in one case (LOAD_GLOBAL), actually inlines PyDict_GetItem manually.
If it weren't for this, I could have easily done 3.0 super without
bytecode hacking, by making a custom dict that allows another dict to
shadow it, and putting the new super object in the shadowing dict.
I know it's for performance, and that if func_globals were made
polymorphic, it'd bring the pystone benchmark to its knees, begging for
a quick and merciful death. That's not what I'm proposing.
I propose adding a read-only attribute func_extra_globals to the
function object, default NULL. In the interpreter loop, global lookups
try func_extra_globals first if it's not NULL. It's accessed using
PyObject_* functions.
Here are the reasons I think this is a good idea:
- It should have near zero impact on performance in the general case
because NULL checks are quick. There would be another attribute in the
frame object (f_extra_globals), almost always NULL.
- Language enhancement prototypes that currently use bytecode hacking
could be accomplished with a method wrapper and a func_extra_globals
dict. The prototypes could be pure Python, and thus more general, less
brittle, and easier to get right. Hacking closures is nasty business.
- I'm sure lots of other stuff that I can't think of, where it'd be nice
to dynamically add information to a method or function that can be
accessed as a variable. Pure-Python function preambles whose results can
be seen by the original function would be pretty sweet.
- Because func_extra_globals would be read-only and default NULL, it'd
almost always be obvious when it's getting messed with. A
wrapper/decorator or a metaclass, and a call to types.FunctionType()
would signal that.
- func_globals would almost never have to be overridden: for most
purposes (besides security), shadowing it is actually better, as it
leaves the function's module fully accessible.
Anybody else think it's awesome? :) How about opinions of major suckage?
If it helps acceptance, I'd be willing to make a patch for this. It
looks pretty straightforward.
Neil
Title says it all. Got used to += et al. My mind often expects augmented
assignment syntax to exist uniformly for whatever transform.
If I am not mistaken, python syntax doesn't permit augmented assignment
operators to sit between parens so that )= wouldn't risk confusing quick
machine- or eye-scans to match parens.
Cheers, BB
I'm not talking about having the runtime call the superclass __init__
for you, as I am aware of the arguments over it and I am against it
myself. I'm talking about checking whether it's been called within a
subclass's own __init__.
There are many kinds of objects with such complex underpinnings or
initialization that leaving out the call to superclass __init__ would be
disastrous. There are two situations I can think of where enforcing its
invocation could be useful: a corporate environment and a teaching
environment. (I've done the former and I'm working in the latter.)
If someone forgets to call a superclass __init__, problems may not show
up until much later. Even if they do show up immediately, it's almost
never obvious what the real problem is, especially to someone who is new
to programming or is working on someone else's code.
I've got a working prototype metaclass and class instance
(require_super) and decorator (super_required). Decorating a
require_super method with @super_required will require any subclass
override to call its superclass method, or it throws a TypeError upon
exiting the subclass method. Here's how it works on the __init__ problem:
class A(require_super):
@super_required
def __init__(self):
pass
a = A() # No problem
class B(A):
def __init__(self):
super(B, self).__init__()
b = B() # No problem
class C(B):
def __init__(self):
pass # this could be a problem
c = C() # TypeError: C.__init__: no super call
class D(C):
def __init__(self):
super(D, self).__init__()
d = D() # TypeError: C.__init__: no super call
As long as A.__init__ is eventually called, it doesn't raise a TypeError.
There's not much magic involved (as metaclasses go), just explicit and
implicit method wrappers, and no crufty-looking magic words in the
subclasses. Not calling the superclass method results in immediate
runtime feedback. I've tested this on a medium-small, real-life
single-inheritance hierarchy and it seems to work just fine. (I *think*
it should work with multiple inheritance.)
Two questions:
1. Is the original problem (missed superclass method calls) big enough
to warrant language, runtime, or library support for a similar solution?
2. Does anybody but me think this is a great idea?
Neil