relative speed of incremention syntaxes (or "i=i+1" VS "i+=1")

Mon Aug 22 00:14:37 EDT 2011

On Aug 21, 10:27 am, Andreas Löscher <andreas.loesc... at s2005.tu-
chemnitz.de> wrote:
>
> from Python/ceval.c:
>
> 1316            case BINARY_ADD:
> 1317                w = POP();
> 1318                v = TOP();
> 1319                if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) {
> 1320                    /* INLINE: int + int */
> 1321                    register long a, b, i;
> 1322                    a = PyInt_AS_LONG(v);
> 1323                    b = PyInt_AS_LONG(w);
> 1324                    /* cast to avoid undefined behaviour
> 1325                       on overflow */
> 1326                    i = (long)((unsigned long)a + b);
> 1327                    if ((i^a) < 0 && (i^b) < 0)
> 1328                        goto slow_add;
> 1329                    x = PyInt_FromLong(i);
> 1330                }
> 1331                else if (PyString_CheckExact(v) &&
> 1332                         PyString_CheckExact(w)) {
> 1333                    x = string_concatenate(v, w, f, next_instr);
> 1334                    /* string_concatenate consumed the ref to v */
> 1335                    goto skip_decref_vx;
> 1336                }
> 1337                else {
> 1338                  slow_add:
> 1339                    x = PyNumber_Add(v, w);
> 1340                }
> 1341                Py_DECREF(v);
> 1342              skip_decref_vx:
> 1343                Py_DECREF(w);
> 1344                SET_TOP(x);
> 1345                if (x != NULL) continue;
> 1346                break;
>
> 1532            case INPLACE_ADD:
> 1533                w = POP();
> 1534                v = TOP();
> 1535                if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) {
> 1536                    /* INLINE: int + int */
> 1537                    register long a, b, i;
> 1538                    a = PyInt_AS_LONG(v);
> 1539                    b = PyInt_AS_LONG(w);
> 1540                    i = a + b;
> 1541                    if ((i^a) < 0 && (i^b) < 0)
> 1542                        goto slow_iadd;
> 1543                    x = PyInt_FromLong(i);
> 1544                }
> 1545                else if (PyString_CheckExact(v) &&
> 1546                         PyString_CheckExact(w)) {
> 1547                    x = string_concatenate(v, w, f, next_instr);
> 1548                    /* string_concatenate consumed the ref to v */
> 1549                    goto skip_decref_v;
> 1550                }
> 1551                else {
> 1552                  slow_iadd:
> 1553                    x = PyNumber_InPlaceAdd(v, w);
> 1554                }
> 1555                Py_DECREF(v);
> 1556              skip_decref_v:
> 1557                Py_DECREF(w);
> 1558                SET_TOP(x);
> 1559                if (x != NULL) continue;
> 1560                break;
>
> As for using Integers, the first case (line 1319 and 1535) are true and
> there is no difference in Code. However, Python uses a huge switch-case
> construct to execute it's opcodes and INPLACE_ADD cames after
> BINARY_ADD, hence the difference in speed.

That fragment of cevel.c is from a 2.x version. Python 2.x supports
both a PyInt and PyLong type and the cevel loop optimized the PyInt
case only. On my system, I could not measure a difference between
binary and inplace addition.

Python 3.x behaves differently:

        TARGET(BINARY_ADD)
            w = POP();
            v = TOP();
            if (PyUnicode_CheckExact(v) &&
                     PyUnicode_CheckExact(w)) {
                x = unicode_concatenate(v, w, f, next_instr);
                /* unicode_concatenate consumed the ref to v */
                goto skip_decref_vx;
            }
            else {
                x = PyNumber_Add(v, w);
            }
            Py_DECREF(v);
          skip_decref_vx:
            Py_DECREF(w);
            SET_TOP(x);
            if (x != NULL) DISPATCH();
            break;

        TARGET(INPLACE_ADD)
            w = POP();
            v = TOP();
            if (PyUnicode_CheckExact(v) &&
                     PyUnicode_CheckExact(w)) {
                x = unicode_concatenate(v, w, f, next_instr);
                /* unicode_concatenate consumed the ref to v */
                goto skip_decref_v;
            }
            else {
                x = PyNumber_InPlaceAdd(v, w);
            }
            Py_DECREF(v);
          skip_decref_v:
            Py_DECREF(w);
            SET_TOP(x);
            if (x != NULL) DISPATCH();
            break;

cevel just calls PyNumber_Add or PyNumber_InPlaceAdd. If you look at
the code for PyNumber_InPlaceAdd (in abstract.c), it calls an internal
function binary_iop1 with pointers to nb_inplace_add and nb_add.
binary_iop1 then checks if nb_inplace_add exists. The PyLong type does
not implement nb_inplace_add so the check fails and binary_iop1 used
nb_add.

In recent version of gmpy and gmpy2, I implemented the nb_inplace_add
function and performance (for the gmpy.mpz type) is much better for
the in-place addition.

For the adventuresome, gmpy2 implements a mutable integer type called
xmpz. It isn't much faster until the values are so large that the
memory copy times become significant. (Some old gmpy documentation
implies that operations with mutable integers should be much faster.
With agressive caching of deleted objects, the object creation
overhead is very low. So the big win for mutable integers is reduced
to avoiding memory copies.)

casevh

>
> To be clear, this is nothing you should consider when writing fast code.
> Complexity wise they both are the same.