Question: Is there any faster way to run benchmarks on pypy/module/cpyext module?

Hello pypy experts! What I want to do is to run a performance test on the cpyext module. The translation&compilation procedure took about 1 hour on my computer. After I modifying some code( just a slight modification under pypy/modules/cpyext), the whole translation procedure seemed to restart completely, taking another one hour. I found pytest could do unit test, but it seems to be run on a different interpreter such as Cpython2 or another pre-built pypy interpreter. So I was wondering if there is any way to run performance benchmarks partially, without compiling the whole project, honestly thanks! -- zy hou

On 28/5/23 21:55, hzy15610046011 via pypy-dev wrote:
The short answer is "you need to do a complete translation in order to benchmark". But you can run some microbenchmarks like in this blog post [0] and use kcachegrind to look for inefficiencies if you know what you are looking for. Is there some particular code pattern you would like to speed up? Matti [0] https://www.pypy.org/posts/2018/09/inside-cpyext-why-emulating-cpython-c-808...

Thanks for your answer! I read the link [0] and run such benchmarks, finding that slow performance on allocate_tuple still exists now. On my cpu, it was 0.65s for cpython3.8, and 7.6s for PyPy 3.9. I am wondering if this could lead to the undesired low performance on pandas. As for my code pattern, I am trying to add support for PyPy in my simulation library, and the workflow of my project was like that: 1. Read data from excel files 2. Perform CPU-intensive computations with an object-oriented program involving a number of objects. 3. Write simulation data to a SQLite database by Pandas, just using `pd.to_sql`. When running on PyPy, step 2 was more than 8 times faster than CPython interpreter. However, step 3 were 3~5 times slower. I have found that it was not a problem caused by the sqlite3 library inside PyPy, because on a pure-python SQLite3 program, PyPy was 1.2x~2x faster than CPython. So this problem might be due to the C-API performance problem when calling pandas. As far as I know, (1) the easiest way is to rewrite a pure-python table IO library instead of pandas, because there were just few functions in pandas that had been imported into my project. (2) But if one day the performance of pandas on PyPy could be better (about 0.5x~0.8x of that on CPython), the better idea should be continuing using Pandas, because most of the python programmer knows it. Could you please give me some suggestion about what I should do to solve this problem? Should I choose way (1) to implement a pure-python table library, or had better wait for (2)? Also I am interested in PyPy project itself, and wondering if improving performance for `Py_BuildValue` is feasible. Thanks! Hou

On 29/5/23 07:37, hzy15610046011 via pypy-dev wrote:
I would think a properly written pure-python solution (1) could out perform any c-extension on PyPy, but I don't think such a project exists. For (2), in the long term there is HPy [0]. In the short term, there are many possible optimizations we could do for cpyext. Are you sure Py_BuildValue is the top of the list, i.e. did you profile `pd.to_sql` and that came out as a very common and slower-than-cpython function? Matti [0] https://hpyproject.org/

I found one operation leading to the speed decrement. When inputs are a list of dictionaries, pandas will firstly convert them to a series of numpy.ndarray. This conversion might be slow when using PyPy. I made a small example inside the micro benchmark project like this: ==================== def bench_build_np(): with Timer('np.array'): for i in xrange(N//10): myarray = np.array([i]) ==================== On PyPy, it took about 2.3s, and on CPython it took about 0.22s. I think I should try to find out why PyPy is slow on such condition. But though I have read the docs for several times, I am not familiar with the details of PyPy project. Could you please give me some advice on why dynamically creating Numpy arrays is slow, based on how PyPy works?

On 28/5/23 21:55, hzy15610046011 via pypy-dev wrote:
The short answer is "you need to do a complete translation in order to benchmark". But you can run some microbenchmarks like in this blog post [0] and use kcachegrind to look for inefficiencies if you know what you are looking for. Is there some particular code pattern you would like to speed up? Matti [0] https://www.pypy.org/posts/2018/09/inside-cpyext-why-emulating-cpython-c-808...

Thanks for your answer! I read the link [0] and run such benchmarks, finding that slow performance on allocate_tuple still exists now. On my cpu, it was 0.65s for cpython3.8, and 7.6s for PyPy 3.9. I am wondering if this could lead to the undesired low performance on pandas. As for my code pattern, I am trying to add support for PyPy in my simulation library, and the workflow of my project was like that: 1. Read data from excel files 2. Perform CPU-intensive computations with an object-oriented program involving a number of objects. 3. Write simulation data to a SQLite database by Pandas, just using `pd.to_sql`. When running on PyPy, step 2 was more than 8 times faster than CPython interpreter. However, step 3 were 3~5 times slower. I have found that it was not a problem caused by the sqlite3 library inside PyPy, because on a pure-python SQLite3 program, PyPy was 1.2x~2x faster than CPython. So this problem might be due to the C-API performance problem when calling pandas. As far as I know, (1) the easiest way is to rewrite a pure-python table IO library instead of pandas, because there were just few functions in pandas that had been imported into my project. (2) But if one day the performance of pandas on PyPy could be better (about 0.5x~0.8x of that on CPython), the better idea should be continuing using Pandas, because most of the python programmer knows it. Could you please give me some suggestion about what I should do to solve this problem? Should I choose way (1) to implement a pure-python table library, or had better wait for (2)? Also I am interested in PyPy project itself, and wondering if improving performance for `Py_BuildValue` is feasible. Thanks! Hou

On 29/5/23 07:37, hzy15610046011 via pypy-dev wrote:
I would think a properly written pure-python solution (1) could out perform any c-extension on PyPy, but I don't think such a project exists. For (2), in the long term there is HPy [0]. In the short term, there are many possible optimizations we could do for cpyext. Are you sure Py_BuildValue is the top of the list, i.e. did you profile `pd.to_sql` and that came out as a very common and slower-than-cpython function? Matti [0] https://hpyproject.org/

I found one operation leading to the speed decrement. When inputs are a list of dictionaries, pandas will firstly convert them to a series of numpy.ndarray. This conversion might be slow when using PyPy. I made a small example inside the micro benchmark project like this: ==================== def bench_build_np(): with Timer('np.array'): for i in xrange(N//10): myarray = np.array([i]) ==================== On PyPy, it took about 2.3s, and on CPython it took about 0.22s. I think I should try to find out why PyPy is slow on such condition. But though I have read the docs for several times, I am not familiar with the details of PyPy project. Could you please give me some advice on why dynamically creating Numpy arrays is slow, based on how PyPy works?
participants (2)
-
hzy15610046011
-
Matti Picus