Syscall Interface slowness
Hello, Today I was doing some experiment with CPython and PyPy. I was very impressed by the performance of PyPy, when it’s doing operations in user space, it was almost 20 times faster than CPython. Then I decided to switch our Python CLI to PyPy and I run one of our major command in our CLI and results were worse than CPython. It got slower! Then I started to research it more. Our CLI’s characteristic is that it calls multiple other programs and read a lot of configuration data and create many files which mean all of those operations were related to sys calls. Then I run some simple test cases, tried to read and write millions of lines to a file or create and kill multiple processes. All of these operations were almost 5 times slower than CPython. I run my tests both MacOS and RHEL with latest version of PyPy3.7 My question is that, is that something known? Or can it be some improvement area that can be contributed? Best, Emre Yavuz
I have a system call-heavy program ( https://stromberg.dnsalias.org/~strombrg/backshift/), that is faster with pypy on one machine, and faster with CPython+Cython on another. Same code, different machines, different relative speeds for the two implementations. For a long time, I thought pypy was just faster at CPU and slower at I/O, but it turns out that's not always true. HTH. On Sat, May 1, 2021 at 1:50 PM Emre Yavuz <emre.yavuz169@gmail.com> wrote:
Hello,
Today I was doing some experiment with CPython and PyPy. I was very impressed by the performance of PyPy, when it’s doing operations in user space, it was almost 20 times faster than CPython.
Then I decided to switch our Python CLI to PyPy and I run one of our major command in our CLI and results were worse than CPython. It got slower! Then I started to research it more. Our CLI’s characteristic is that it calls multiple other programs and read a lot of configuration data and create many files which mean all of those operations were related to sys calls.
Then I run some simple test cases, tried to read and write millions of lines to a file or create and kill multiple processes. All of these operations were almost 5 times slower than CPython. I run my tests both MacOS and RHEL with latest version of PyPy3.7
My question is that, is that something known? Or can it be some improvement area that can be contributed?
Best, Emre Yavuz _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
-- Dan Stromberg
That’s really interesting! I also started to think it’s slower in I/O, I also run my tests in Ubuntu and Debian which resulted same. What type of system runs PyPy faster? Did you or someone have experience that and chance to look what’s making it slower? Because difference is huge when it comes to sys calls. I am planning to dive into the code to find out more if it’s not a known fact Best, Emre Yavuz
On 1 May 2021, at 22:55, Dan Stromberg <strombrg@gmail.com> wrote:
I have a system call-heavy program (https://stromberg.dnsalias.org/~strombrg/backshift/ <https://stromberg.dnsalias.org/~strombrg/backshift/>), that is faster with pypy on one machine, and faster with CPython+Cython on another. Same code, different machines, different relative speeds for the two implementations.
For a long time, I thought pypy was just faster at CPU and slower at I/O, but it turns out that's not always true.
HTH.
On Sat, May 1, 2021 at 1:50 PM Emre Yavuz <emre.yavuz169@gmail.com <mailto:emre.yavuz169@gmail.com>> wrote: Hello,
Today I was doing some experiment with CPython and PyPy. I was very impressed by the performance of PyPy, when it’s doing operations in user space, it was almost 20 times faster than CPython.
Then I decided to switch our Python CLI to PyPy and I run one of our major command in our CLI and results were worse than CPython. It got slower! Then I started to research it more. Our CLI’s characteristic is that it calls multiple other programs and read a lot of configuration data and create many files which mean all of those operations were related to sys calls.
Then I run some simple test cases, tried to read and write millions of lines to a file or create and kill multiple processes. All of these operations were almost 5 times slower than CPython. I run my tests both MacOS and RHEL with latest version of PyPy3.7
My question is that, is that something known? Or can it be some improvement area that can be contributed?
Best, Emre Yavuz _______________________________________________ pypy-dev mailing list pypy-dev@python.org <mailto:pypy-dev@python.org> https://mail.python.org/mailman/listinfo/pypy-dev <https://mail.python.org/mailman/listinfo/pypy-dev>
-- Dan Stromberg
The system that's faster with Pypy is a AMD FX(tm)-4300 Quad-Core Processor. The system that's faster with CPython+Cython is a AMD Athlon(tm) II X3 455 Processor. On Sat, May 1, 2021 at 3:08 PM Emre Yavuz <emre.yavuz169@gmail.com> wrote:
That’s really interesting! I also started to think it’s slower in I/O, I also run my tests in Ubuntu and Debian which resulted same.
What type of system runs PyPy faster?
Did you or someone have experience that and chance to look what’s making it slower? Because difference is huge when it comes to sys calls.
I am planning to dive into the code to find out more if it’s not a known fact
Best, Emre Yavuz
On 1 May 2021, at 22:55, Dan Stromberg <strombrg@gmail.com> wrote:
I have a system call-heavy program ( https://stromberg.dnsalias.org/~strombrg/backshift/), that is faster with pypy on one machine, and faster with CPython+Cython on another. Same code, different machines, different relative speeds for the two implementations.
For a long time, I thought pypy was just faster at CPU and slower at I/O, but it turns out that's not always true.
HTH.
On Sat, May 1, 2021 at 1:50 PM Emre Yavuz <emre.yavuz169@gmail.com> wrote:
Hello,
Today I was doing some experiment with CPython and PyPy. I was very impressed by the performance of PyPy, when it’s doing operations in user space, it was almost 20 times faster than CPython.
Then I decided to switch our Python CLI to PyPy and I run one of our major command in our CLI and results were worse than CPython. It got slower! Then I started to research it more. Our CLI’s characteristic is that it calls multiple other programs and read a lot of configuration data and create many files which mean all of those operations were related to sys calls.
Then I run some simple test cases, tried to read and write millions of lines to a file or create and kill multiple processes. All of these operations were almost 5 times slower than CPython. I run my tests both MacOS and RHEL with latest version of PyPy3.7
My question is that, is that something known? Or can it be some improvement area that can be contributed?
Best, Emre Yavuz _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
-- Dan Stromberg
-- Dan Stromberg
Hi Carl, Sorry I couldn’t receive your message (probably something wrong in my mail configuration) but I saw the message from digest. I created self contained program for this example. I am using “writelines” That’s paste bin link to program: https://pastebin.com/D6auMcwN <https://pastebin.com/D6auMcwN> Command line: (devpy) emreyavuz:tmp emreyavuz$ python3 test-writer Time has been spend for copying data 28 secs (devpy) emreyavuz:tmp emreyavuz$ pypy3 test-writer Time has been spend for copying data 118 secs Best, Emre Yavuz
Hi Emre, thanks for that, it's great! there's nothing deeply wrong in PyPy here, just plain optimization work needed. We're tracking it here, in the issue I already linked: https://foss.heptapod.net/pypy/pypy/-/issues/3126 If you find more like this, please let us know! Cheers, CF On 5/2/21 12:49 AM, Emre Yavuz wrote:
Hi Carl,
Sorry I couldn’t receive your message (probably something wrong in my mail configuration) but I saw the message from digest.
I created self contained program for this example. I am using “writelines”
*That’s paste bin link to program:* https://pastebin.com/D6auMcwN <https://pastebin.com/D6auMcwN>
*Command line:* (devpy) emreyavuz:tmp emreyavuz$ python3 test-writer Time has been spend for copying data 28 secs (devpy) emreyavuz:tmp emreyavuz$ pypy3 test-writer Time has been spend for copying data 118 secs
Best, Emre Yavuz
Ah, I found one problem: your script uses .writelines with a string as the argument. writelines usually takes an iterator (like a list) but it will also work with a string argument, and then do: for char in s: w.write(char) if I replace the writelines(...) with a .write(...) it becomes much faster, both on cpython and pypy. CF On 5/2/21 12:49 AM, Emre Yavuz wrote:
Hi Carl,
Sorry I couldn’t receive your message (probably something wrong in my mail configuration) but I saw the message from digest.
I created self contained program for this example. I am using “writelines”
*That’s paste bin link to program:* https://pastebin.com/D6auMcwN <https://pastebin.com/D6auMcwN>
*Command line:* (devpy) emreyavuz:tmp emreyavuz$ python3 test-writer Time has been spend for copying data 28 secs (devpy) emreyavuz:tmp emreyavuz$ pypy3 test-writer Time has been spend for copying data 118 secs
Best, Emre Yavuz
Hi Carl, Thank you for informing about the issue link, I’ll be following that. Yes, "writelines" took much more longer time in that case I suppose but I would expect them to be same but it’s good to know that with few tweaks it can actually got faster. I’ll take a look more on other things made the CLI slower and if I find something interesting, I’ll let you know for sure! Best, Emre Yavuz
On 2 May 2021, at 16:06, Carl Friedrich Bolz-Tereick <cfbolz@gmx.de> wrote:
Ah, I found one problem: your script uses .writelines with a string as the argument. writelines usually takes an iterator (like a list) but it will also work with a string argument, and then do:
for char in s: w.write(char)
if I replace the writelines(...) with a .write(...) it becomes much faster, both on cpython and pypy.
CF
On 5/2/21 12:49 AM, Emre Yavuz wrote:
Hi Carl,
Sorry I couldn’t receive your message (probably something wrong in my mail configuration) but I saw the message from digest.
I created self contained program for this example. I am using “writelines”
*That’s paste bin link to program:* https://pastebin.com/D6auMcwN <https://pastebin.com/D6auMcwN>
*Command line:* (devpy) emreyavuz:tmp emreyavuz$ python3 test-writer Time has been spend for copying data 28 secs (devpy) emreyavuz:tmp emreyavuz$ pypy3 test-writer Time has been spend for copying data 118 secs
Best, Emre Yavuz
Hi Emre, IO can definitely be slower than CPython, but not in all cases. A few of them are known and we try to improve them, eg readline operations on files: https://foss.heptapod.net/pypy/pypy/-/issues/3126 5x is definitely too much of a difference, if you can minimize that to a small self-contained example, that would be a very valuable bug report and we would definitely try to fix it. Cheers, Carl Friedrich On 5/1/21 10:50 PM, Emre Yavuz wrote:
Hello,
Today I was doing some experiment with CPython and PyPy. I was very impressed by the performance of PyPy, when it’s doing operations in user space, it was almost 20 times faster than CPython.
Then I decided to switch our Python CLI to PyPy and I run one of our major command in our CLI and results were worse than CPython. It got slower! Then I started to research it more. Our CLI’s characteristic is that it calls multiple other programs and read a lot of configuration data and create many files which mean all of those operations were related to sys calls.
Then I run some simple test cases, tried to read and write millions of lines to a file or create and kill multiple processes. All of these operations were almost 5 times slower than CPython. I run my tests both MacOS and RHEL with latest version of PyPy3.7
My question is that, is that something known? Or can it be some improvement area that can be contributed?
Best, Emre Yavuz
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
participants (3)
-
Carl Friedrich Bolz-Tereick
-
Dan Stromberg
-
Emre Yavuz