
Hi Ben, for what you're interested in, you might give a look at WPython 1.0 ( https://code.google.com/archive/p/wpython/downloads ) and 1.1 ( https://code.google.com/archive/p/wpython2/downloads ), but they cover a lot of optimizations (as you can see from a brief look at the slides): RETURN_CONST and fusing some opcodes for binary operations are only some of them. For this reason, it's also very difficult to micro-benchmark every single change... :-/ Cheers, Cesare <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Mail priva di virus. www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> 2017-05-25 16:28 GMT+02:00 Ben Hoyt <benhoyt@gmail.com>:
Thanks, Victor. That's very helpful. So RETURN_NONE (and probably RETURN_SMALL_CONST) are not worth it, based on your empirical tests. Your patch shows how (relatively) straight-forward it is to test out new opcodes.
I'm still optimistic about the value of COMPARE_IS_NONE and COMPARE_IS_NOT_NONE, though. Mainly because I've done a quick expansion of LOAD_CONST(None) + COMPARE_OP and it's quite a bit more code and many more instructions than COMPARE_IS_NONE would be:
LOAD_CONST(None) COMPARE_OP PyObject *value = ((PyTupleObject *)(consts))->ob_item[oparg]; value->ob_refcnt++; *stack_pointer++ = value; FAST_DISPATCH(); PyObject *right = *--stack_pointer; PyObject *left = stack_pointer[-1]
// cmp_outcome(), presumably inlined int r = 0; switch (compare_oparg) { case PyCmp_IS: r = (left == right); break; case PyCmp_IS_NOT: r = (left != right); break; case ... } PyObject *res = r ? Py_True : Py_False; res->ob_refcnt++;
if (--(left)->ob_refcnt == 0) _Py_Dealloc(left); if (--(right)->ob_refcnt == 0) _Py_Dealloc(right); stack_pointer[-1] = res; if (res == NULL) goto error;
PREDICT(POP_JUMP_IF_FALSE); PREDICT(POP_JUMP_IF_TRUE); DISPATCH();
COMPARE_IS_NONE PyObject* left = stack_pointer[-1]; // TOP() PyObject* res = (left == Py_None) ? Py_True : Py_False; res->ob_refcnt++; if (--(left)->ob_refcnt == 0) _Py_Dealloc(left); stack_pointer[-1] = res; // SET_TOP(res) PREDICT(POP_JUMP_IF_FALSE); PREDICT(POP_JUMP_IF_TRUE); DISPATCH();
You don't have to get the const arg, there are fewer increfs/decrefs, you skip a pop, you don't have to test res==NULL (because it's Py_True or Py_False, which are never NULL), and if there are separate COMPARE_IS_NONE and COMPARE_IS_NOT_NONE you don't have to switch on the compare arg (though I'm not sure if that part will be worth it).
For reference, based on a grep, " is None" occurs 2737 times in the CPython source tree, and " is not None" 2010 times. And I know personally I often use them in loops as well is at the start of functions (for mutable default arg handling).
Still, the performance proof will be in the pudding! I might hack these two opcodes together and test it at some point.
-Ben
On Thu, May 25, 2017 at 6:47 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi Ben,
I am not convinced that combining operations will have a significant impact in term of performance. Mark Shanon implemented that in his HotPy project.
I proposed a RETURN_NONE opcode to combine LOAD_CONST with RETURN_VALUE. The issue was rejected because I failed to show any speedup.
https://bugs.python.org/issue28800
I would be interested to restart/finish my registervm project to use register-based bytecode. It allows to implement more optmisations and reduce the number of instructions. In my experience, less instructions = faster code.
http://faster-cpython.readthedocs.io/registervm.html
Mark's bytecode uses registers but also a stack.
Victor
Le 24 mai 2017 8:09 PM, "Ben Hoyt" <benhoyt@gmail.com> a écrit :
Hi folks,
I was looking at some `dis` output today, and I was wondering if anyone has investigated optimizing Python (slightly) by adding special-case bytecodes for common expressions or statements involving constants?
For example, I (and, based on a quick grep of the stdlib, many others) write "x is None" and "x is not None" very often. Or "return True" or "return None" or "return 1" and things like that. These all expand into two bytecodes, which seems pretty non-optimal (LOAD_CONST + COMPARE_OP or LOAD_CONST + RETURN_VALUE). It seems we could get an easy speedup for these common cases by adding a peephole optimization and some new opcodes (maybe COMPARE_IS_SMALL_CONST and RETURN_SMALL_CONST for these cases).
I'm not proposing to do this yet, as I'd need to benchmark to see how much of a gain (if any) it would amount to, but I'm just wondering if there's any previous work on this kind of thing. Or, if not, any other thoughts before I try it?
Thanks, Ben
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailma n/options/python-dev/victor.stinner%40gmail.com
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ cesare.di.mauro%40gmail.com