Why not an __assign__ method?
Carlos Alberto Reis Ribeiro
cribeiro at mail.inet.com.br
Wed Apr 4 16:16:41 EDT 2001
At 15:49 02/04/01 -0400, Robin Thomas wrote:
>Best of luck once you get the source; I look forward to reading your
>discoveries and updated proposal.
Now I'm back from the source :-) Unfortunately, I don't have MS VC
installed, so I can't test any change. I tried to check my chances with two
options: BC++ and GCC/CygWin, with no success so far. BTW, why aren't these
two compilers supported? it seems that this question was answered before,
but I could find no conclusive answer. Some people have had success with
GCC/CygWin recently but I could not reproduce it here; maybe it's something
related to my installation. Anyway, back to the topic...
(Oops. I think I'm still offtopic here :-) But I'm not a member of the dev
list; maybe someone can point me to the procedure to follow for such
discussions)
Following Robin's hints, I took a look at the sources. Examining the
problem better, I saw that, while a partial solution may be easy, a
complete solution really needs a more well-structured approach.
At first, I thought that there are two possible approches for the
__assign__ implementation:
1) Modify the parser to insert a new opcode for the assign statement.
(this was not what I had in my mind at first)
2) Include some code on methods to detect "assign-like" behavior.
There are no changes to the parser.
To check if (2) is possible, I tried several constructs, and found two
opcodes that could be intercepted: STORE_NAME and STORE_SUBSCR. These
opcodes happen in the code stream whenever an assignment takes place.
However, there are two other cases that may arise and that make it much
more difficult (in fact, for both of the proposed approches).
Let's focus first on the simple case. I found this test in ceval.c, at line
1502 (Python 2.1b2a):
(...)
case STORE_NAME:
w = GETNAMEV(oparg);
v = POP();
if ((x = f->f_locals) == NULL) {
PyErr_Format(PyExc_SystemError,
"no locals found when storing %s",
PyObject_REPR(w));
break;
}
err = PyDict_SetItem(x, w, v);
Py_DECREF(v);
break;
(...)
In this case, my proposal is to insert the callback to the assign method
(if there is any) immediately before the PyDict_SetItem call. This can be
done both for STORE_NAME and STORE_SUBSCR. It seems to solve the problem,
but unfortunately things are not so simple.
THE PROBLEM...
There are other situations that makes things harder, or even impossible :-(
- BUILD_TUPLE: the construct z = (a+b+c, ) builds a tuple with the
intermediate object before assigning it to the name "z". In this
case, all objects that are being put in the tuple would need to
be "assigned". This is NOT a good idea for a lot of reasons. First
of all, this was not exactly what I originally meant. Tuples may
be built for a lot of reasons, even while the expression is being
evaluated. Also there are performance concerns, because we would
need to make this *every time* this opcode is executed, even for
potentially large tuples.
- CALL_FUNCTION: a similar thing can happen when passing the result
of a expression as a parameter to a function. In this case,
things seems to be complicated by the LOAD_FAST/STORE_FAST opcodes
that are used to access objects directly from the stack.
In fact, these two problems affect the two approaches that I was proposing.
So, it is impossible to use __ASSIGN__ in the way that I devised at first.
However, there is a alternative that I just began exploring, and it may (or
not) make sense. First of all let us restate the original intention:
THE INTENTION
My intention is to devise a way to optimize operations by avoiding the
creation of new objects for every intermediate result. Such intermediate
objects are created inside the methods that execute the operations and
returned by them. With a little knowledge of the nature of the operands -
namely, if at least one of the operands is a intermediate result valid in
the context of the expression - it is possible for the operator
implementation code to re-use such object, executing the operation
in-place. This is safe to do because it relies on some cooperation on the
operator part; if the operation can't be done safely, then it's up to the
operator to create a new object and use it.
THE PROPOSAL
The proposal now is to call a predefined method, on the object that is the
result of any expression, whenever the expression finishes being
calculated. The method to be called could be named either __fix__ (the
object is being "fixed" after the expression); or __result__ (indicating
that the expression was finally evaluated, and the object is the result of
the expression). For example, on:
z = a + b + c
... the method will be called on the resulting object, right before
assignment takes place.
If the expression is comprised of the object alone, the method will not be
called. For example,
z = a
... it's just the assignment, no "fix" required.
This need to be implemented by the compiler, and an extra opcode need to be
inserted on the bytecode stream. It must be included whenever a
mathematical/logical/sequence expression takes place. I'm still checking
the details, but at least I now have a little bit more knowledge of the
scenario.
Carlos Ribeiro
More information about the Python-list
mailing list