I've been thinking about some ideas for reducing the
amount of refcount adjustment that needs to be done,
with a view to making GIL removal easier.
1) Permanent objects
In a typical Python program there are many objects
that are created at the beginning and exist for the
life of the program -- classes, functions, literals,
etc. Refcounting these is a waste of effort, since
they're never going to go away.
So perhaps there could be a way of marking such
objects as "permanent" or "immortal". Any refcount
operation on a permanent object would be a no-op,
so no locking would be needed. This would also have
the benefit of eliminating any need to write to the
object's memory at all when it's only being read.
2) Objects owned by a thread
Python code creates and destroys temporary objects
at a high rate -- stack frames, argument tuples,
intermediate results, etc. If the code is executed
by a thread, those objects are rarely if ever seen
outside of that thread. It would be beneficial if
refcount operations on such objects could be carried
out by the thread that created them without locking.
To achieve this, two extra fields could be added
to the object header: an "owning thread id" and a
"local reference count". (The existing refcount
field will be called the "global reference count"
in what follows.)
An object created by a thread has its owning thread
id set to that thread. When adjusting an object's
refcount, if the current thread is the object's owning
thread, the local refcount is updated without locking.
If the object has no owning thread, or belongs to
a different thread, the object is locked and the
global refcount is updated.
The object is considered garbage only when both
refcounts drop to zero. Thus, after a decref, both
refcounts would need to be checked to see if they
are zero. When decrementing the local refcount and
it reaches zero, the global refcount can be checked
without locking, since a zero will never be written
to it until it truly has zero non-local references
remaining.
I suspect that these two strategies together would
eliminate a very large proportion of refcount-related
activities requiring locking, perhaps to the point
where those remaining are infrequent enough to make
GIL removal practical.
--
Greg
Ok, my idea of a temporary "replace" attribute wrapped around a reload-like function is not a good idea for Python given that things can be dynamically added to modules at any time.
---------------------------------------
Here is my original high level design:
The way I implemented this feature in Slick-C is with indirection. In Python terms, this means that a separate data structure that isn't reference counted holds the method/function object data. The method/function object is changed to just contain a pointer to it. The data structure which holds all method/function data should probably be a non-reference counted dictionary. When a function is deleted, it's name remains in the dictionary but the entry needs to be changed to indicate that it is "null/invalid". When a deleted function is called, an exception should be raised. Adding a function/method means replacing the data in the dictionary. This type of implementation is simple. There's an insignificant amount of overhead on a function/method call (i.e. instead of "func->data" you have "func=*pfunc;if ( func->isInvalid() ) throw exception; else func->data" ).
Technically this algorithm leaks memory since deleted functions/methods are never removed. My response is who cares. When the interpreter cleanup everything function is called, you simple deallocate everything in the dictionary.
---------------------------------
Instead of a temporary "replace" attribute wrapped into a reload-like call, how about giving modules a user settable "replace" attribute. At any time, the user can set/reset this attribute. This would specify how the user wanted functions/methods processed. Always added or always replaced. The "replace" attribute would likely need to be pass through to function objects, class objects, and method objects. For the macro language scenarios, I would just mark every module that got loaded with this attribute.
The proposed implementation I have given is intended to by very "file" oriented (which maps to a Python module).
Would this work in the current code base?
I'm assuming the following:
When a function is added/executed, the module structure is accessible.
When a class is added (i.e. class myclass), the module structure is accessible.
When a method is added/executed, at least the class structure is accessible?
I hope you see where I'm going here. The executed "class myclass" code which defines a new class can copy the module "replace" attribute. the executed "def myfunction" code which defines a new method can copy the class "replace" attribute.
The function/method object structure could remain the same except for the addition of a new function/method pointer member.
The additional code for a function call would look like this:
// Did this function get defined in "replace" mode?
if ( func->doReplace() ) {
// For this one, use the indirect pointer and not the other member data.
func= func->pfunc;
if ( !func->isValid() ) {
throw exception. // Python exeception, not C++ exception
return here...
}
}
// Now do what we used to do
Given the OO nature of Python, a separate function/method type for a replacable function/method could be defined but I suspect it isn't worth the effort. The above psuedo code is very efficient "doReplace" would probably be just a boolean/int member. The "isValid" called would be efficient as well.
One thing my proposed implementation does not cover is adding new data members to a class. I think it is acceptable for this not to be handled.
Please shot this down this high level implementation if it won't work in the current code base.
Also, what does everyone think about the idea of some sort of "replace" attribute for the module? How should it get set? "import module; module.replace=1". I'm probably showing a little lack of knowledge here. Teach me and I'll get it.
____________________________________________________________________________________
Pinpoint customers who are looking for what you sell.
http://searchmarketing.yahoo.com/
Greg Ewin wrote:
> What you suggest sounds like it ought to be possible,
> at first sight, since Python function objects are already
> containers with a reference to another object that
> holds the function's code.
> The problem will be figuring out *when* you're redefining
> a function, because the process of loading a module is a
> very dynamic one in Python. Defining functions and classes
> is done by executing code, not by statically analysing
> declarations as a C compiler does.
My implementation is definitely a high level scetch.
Greg,
The kind of issues you are bringing up is exactly the kind of thing I'm looking for. If there are more, lets see them.
Would temporarily marking the module with "replace" work? I would think that when the function is defined, it has access to the module (because it is adding to its dictionary) and it could check for the "replace" attribute. I'm assuming a certain sequence of execution here since the "replace" attribute would have to removed after the function/method code was executed/loaded. Anyone who knows that this isn't the case, please shoot this down.
Another post I read proposed a Reclass feature that only worked for classes. Given the macro language scenario, you definitely need functions too.
____________________________________________________________________________________
Tonight's top picks. What will you watch tonight? Preview the hottest shows on Yahoo! TV.
http://tv.yahoo.com/
I'm still new to the technical abilities of Python so help me if I misunderstand the current capabilities.
I'd like to see the reload feature of Python enhanced so it can replace the methods for existing class instances, references to methods, and references to functions.
Here's the scenario. Let's say you want to use Python as a macro language. Currently, you can bind a Python function to a key or menu (better do it by name and not reference). That's what most apps need. However, an advanced app like SlickEdit would have classes instances for modeless dialogs (including tool windows) and other data structures. There are also callbacks which would preferably need to be references to functions or methods. With the current implementation you would have to close and reopen dialogs. In other cases, you would need to exit SlickEdit and restart. While there always will be cases where this is necessary, I can tell you from experience that this is a great feature to have since Slick-C does this.
I suspect that there are other scenarios that users would like this capability for.
Java and C# support something like this to a limited extent when you are debugging.
This capability could be a reload option. Their could be cases where one might want the existing instances to use the old implementation. You wouldn't need this to be an option for me. There will always be cases where you have to restart because you made too many changes.
____________________________________________________________________________________
Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get online.
http://smallbusiness.yahoo.com/webhosting
I'm glad to hear it isn't a matter of whether it was useful or not.
The way I implemented this feature in Slick-C is with indirection. In Python terms, this means that a separate data structure that isn't reference counted holds the method/function object data. The method/function object is changed to just contain a pointer to it. The data structure which holds all method/function data should probably be a non-reference counted dictionary. When a function is deleted, it's name remains in the dictionary but the entry needs to be changed to indicate that it is "null/invalid". When a deleted function is called, an exception should be raised. Adding a function/method means replacing the data in the dictionary. This type of implementation is simple. There's an insignificant amount of overhead on a function/method call (i.e. instead of "func->data" you have "func=*pfunc;if ( func->isInvalid() ) throw exception; else func->data" ).
Technically this algorithm leaks memory since deleted functions/methods are never removed. My response is who cares. When the interpreter cleanup everything function is called, you simple deallocate everything in the hash table.
Does anyone know what level of effort would be needed for something like this?
Is my proposed implementation a good one for Python?
____________________________________________________________________________________
Yahoo! oneSearch: Finally, mobile search
that gives answers, not web links.
http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC
I guess this has very few to zero chances of being considered, even for
Python 3, but this being python-ideas I guess it's ok to bring it up. IMO
the del statement is one of the relatively few constructs that stick out
like a sore thumb. For one thing, it is overloaded to mean three different
things:
1) del x: Remove x from the current namespace
2) del x[i]: Equivalent to x.__delitem__(i)
3) del x.a: Equivalent to x.__delattr__('a') and delattr(x,'a')
Here I am mostly arguing for removing the last two; the first could also be
removed if/when Python gets block namespaces, but it is orthogonal to the
others. I don't see the point of complicating the lexer and the grammar with
an extra keyword and statement for something that is typically handled by a
method (my preference), or at least a generic (bulitin) function like len().
The last case is especially superfluous given that there is both a special
method and a generic builtin (delattr) that does the same thing. Neither
item nor attribute deletion are so pervasive to be granted special treatment
at the language level.
I wonder if this was considered and rejected in the Py3K discussions; PEP
3099 doesn't mention anything about it.
George
Hello,
I've been following this discussion. My thoughts mostly reiterate what
has already been said. There's no way to get rid of the GIL without
significantly effecting single threaded performance. IMO getting rid of
the GIL would require writing a mark-and-sweep algorithm. To improve
performance you can do incremental (threaded) marking and detect page
faults so that modified pages can be rescanned for references. The Boehm
garbage collector does this (I think) but Python would need something
much more custom. This type of garbage collector is VERY hard to write.
Worse yet, the current implementation of Python would need a lot of
rewriting.
FYI: I tried using the Boehm collector in SlickEdit and it leaked memory
like crazy. I never figured out why but I suspect it had to do with it
treating everything in memory as a potential pointer.
Ruby's mark-and sweep garbage collector illustrates the loss in single
threaded performance and since it does its own thread scheduling, the
thread performance is bad too.
As Python stands right now, its performance is excellent for single
threading, the implementation is simple, it works well for the typical
Python user, and using processes at least gives a work around. I like
to be a perfectionist as much as the next guy but the pay back doesn't
warrant the level of effort. Where's the easy button when you need oneJ
I thought you Python enthusiasts (especially Guido) might enjoy the
article I just posted on the SlickEdit blog. I'm the CTO and founder of
SlickEdit. I hate saying that because I'm a very humble guy but I
thought you would want to know. The article is called "Comparing Python
to Perl and Ruby", go to http://blog.slickedit.com/. I limited the
article to a simple grammar comparison because I wanted to keep the
article short. Hope you enjoy it.
Guido, I have another article written which talks about Python as well
but I have not yet posted it. If you give me an email address, I will
send it to you to look over before I post it. Don't give me your email
address here. Instead write to support(a)slickedit.com and let them know
that I requested your email address.
Cheers
Clark
Hi Arnaud
| If you want to do it like this, why not do it explicitly:
|
| def exhaust(iterable):
| for i in iterable: pass
|
| Then you can write:
|
| exhaust(f(x) for x in mylist)
Thanks - that's nice. It also gives me the generality I wanted, which was
the ability to use the full LC/genexp "for..." syntax, which I should have
emphasized more, including in the subject of the thread.
Terry
Hi Greg
| The way things are, there is only one coding style for when you don't want
| the results. You're suggesting the addition of another one. That *would* be
| un-Pythonic.
But the same remark could be made about using a list and writing explicit
loops to accumulate results, and the later addition of list comprehensions.
Wasn't that un-Pythonic for the same reason?
Terry
What's the most compact way to repeatedly call a function on a list without
accumulating the results?
While I can accumulate results via
a = [f(x) for x in mylist]
or with a generator, there doesn't seem to be a way to do this without
accumulating the results. I guess I need to either use the above and ignore
the result, or use
for x in mylist:
f(x)
I run into this need quite frequently. If I write
[f(x) for x in mylist]
with no assignment, will Python notice that I don't want the accumulated
results and silently toss them for me?
A possible syntax change would be to allow the unadorned
f(x) for x in mylist
And raise an error if someone tries to assign to this.
Terry