[pypy-dev] Poor performance with custom bytecode

Timothy Baldridge tbaldridge at gmail.com
Fri Feb 17 14:03:22 CET 2012


Last night, I was finally able to get enough of core.clj implemented
to run a basic factorial program in clojure-py
(https://github.com/halgari/clojure-py). In this project we use
byteplay to generate bytecode for clojure routines, and run the entire
language off the python vm. In general, I've been very impressed with
the performance of pypy, but this factorial program takes about 2x
longer to complete than the same routine running on CPython. Perhaps
someone can give me some pointers? The core of the whole test is the
factorial function:

(ns clojure.examples.factorial)

(defn fact [x]
    (loop [n x f 1]
        (if (= n 1)
            f
            (recur (dec n) (* f n)))))

(defn test [times]
    (loop [rem times]
        (if (> rem 0)
            (do (fact 20000)
                (print rem)
                (recur (dec rem))))))

(test 20)

It all seems to match normal python bytecode, with one exception: the
implementation of *, = and >. In clojure these functions can take 0 to
n arguments and perform different logic based on the results. Our
solution? to stuff the arguments into __argsv__ and then perform a
len() on this argument, and jump to different code blocks based on the
length of the input tuple. Now I know this code may not be super fast,
but it seems to work fine in CPython. Could this be the pain point for
us in pypy? Any thoughts/ideas?

clojure.examples.factorial=> (dis.dis *)
  0           0 LOAD_FAST                0 (__argsv__)
              3 LOAD_ATTR                0 (__len__)
              6 CALL_FUNCTION            0
              9 LOAD_CONST               1 (0)
             12 COMPARE_OP               2 (==)
             15 POP_JUMP_IF_FALSE       22
             18 LOAD_CONST               2 (1)
             21 RETURN_VALUE
        >>   22 LOAD_FAST                0 (__argsv__)
             25 LOAD_ATTR                0 (__len__)
             28 CALL_FUNCTION            0
             31 LOAD_CONST               2 (1)
             34 COMPARE_OP               2 (==)
             37 POP_JUMP_IF_FALSE       54
             40 LOAD_FAST                0 (__argsv__)
             43 LOAD_CONST               1 (0)
             46 BINARY_SUBSCR
             47 STORE_FAST               1 (x)
             50 LOAD_FAST                1 (x)
             53 RETURN_VALUE
        >>   54 LOAD_FAST                0 (__argsv__)
             57 LOAD_ATTR                0 (__len__)
             60 CALL_FUNCTION            0
             63 LOAD_CONST               3 (2)
             66 COMPARE_OP               2 (==)
             69 POP_JUMP_IF_FALSE      100
             72 LOAD_FAST                0 (__argsv__)
             75 LOAD_CONST               1 (0)
             78 BINARY_SUBSCR
             79 STORE_FAST               1 (x)
             82 LOAD_FAST                0 (__argsv__)
             85 LOAD_CONST               2 (1)
             88 BINARY_SUBSCR
             89 STORE_FAST               2 (y)

1010          92 LOAD_FAST                1 (x)
             95 LOAD_FAST                2 (y)
             98 BINARY_MULTIPLY
             99 RETURN_VALUE
        >>  100 LOAD_FAST                0 (__argsv__)
            103 LOAD_ATTR                0 (__len__)
            106 CALL_FUNCTION            0
            109 LOAD_CONST               3 (2)
            112 COMPARE_OP               5 (>=)
            115 POP_JUMP_IF_FALSE      173
            118 LOAD_FAST                0 (__argsv__)
            121 LOAD_CONST               1 (0)
            124 BINARY_SUBSCR
            125 STORE_FAST               1 (x)
            128 LOAD_FAST                0 (__argsv__)
            131 LOAD_CONST               2 (1)
            134 BINARY_SUBSCR
            135 STORE_FAST               2 (y)
            138 LOAD_FAST                0 (__argsv__)
            141 LOAD_CONST               3 (2)
            144 SLICE+1
            145 STORE_FAST               3 (more)

1012         148 LOAD_GLOBAL              1 (reduce1)
            151 LOAD_GLOBAL              2 (*)
            154 LOAD_GLOBAL              2 (*)
            157 LOAD_FAST                1 (x)
            160 LOAD_FAST                2 (y)
            163 CALL_FUNCTION            2
            166 LOAD_FAST                3 (more)
            169 CALL_FUNCTION            3
            172 RETURN_VALUE
        >>  173 LOAD_CONST               4 (<type 'exceptions.Exception'>)
            176 CALL_FUNCTION            0
            179 RAISE_VARARGS            1
None


Thanks,

Timothy Baldridge


-- 
“One of the main causes of the fall of the Roman Empire was
that–lacking zero–they had no way to indicate successful termination
of their C programs.”
(Robert Firth)


More information about the pypy-dev mailing list