Asynchronous IO ideas for Python
![](https://secure.gravatar.com/avatar/ee42136a5d4d647bfefa1068a1b9d7d1.jpg?s=120&d=mm&r=g)
Hi, I've written an article about how I perceive the future of asynchronous I/O in Python. It's not something that should directly be incorporated into python now, but I believe it's useful for python-ideas list. https://medium.com/@paulcolomiets/the-future-of-asynchronous-io-in-python-ce... And a place for comments at Hacker News: https://news.ycombinator.com/item?id=8662782 I hope being helpful with this writeup :) -- Paul
![](https://secure.gravatar.com/avatar/b9238157e5a9e443647f0ed0f723d20e.jpg?s=120&d=mm&r=g)
Relevant part of the video with the normal Python stats on the left and PyParallel on the right: https://www.youtube.com/watch?v=4L4Ww3ROuro#t=838 Transcribed stats: Regular Python HTTP server: Thread Stats Avg Stdev Max Latency 4.93ms 714us 10ms Req/Seq 552 154 1.1k 10,480 requests in 10s, 69MB 1048 reqs/sec, 6.9MB/s PyParallel (4 core Windows VM): Thread Stats Avg Stdev Max Latency 2.41ms 531us 10ms Req/Seq 1.74k 183 2.33k 32,831 requests in 10s, 216MB 3263 reqs/sec, 21MB/s So basically a bit less than linear scaling with more cores, which isn't too bad for a full debug build running on a VM. -----Original Message----- From: Trent Nelson Sent: Wednesday, November 26, 2014 3:36 PM To: 'Paul Colomiets'; python-ideas Subject: RE: [Python-ideas] Asynchronous IO ideas for Python Have you seen this?: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploite... I spend the first 80-ish slides on async I/O. (That was a year ago. I've done 2-3 sprints on it since then and have gotten it to a point where I can back up the claims with hard numbers on load testing benchmarks, demonstrated in the most recent video: https://www.youtube.com/watch?v=4L4Ww3ROuro.) Trent. -----Original Message----- From: Python-ideas [mailto:python-ideas-bounces+trent=snakebite.org@python.org] On Behalf Of Paul Colomiets Sent: Wednesday, November 26, 2014 12:35 PM To: python-ideas Subject: [Python-ideas] Asynchronous IO ideas for Python Hi, I've written an article about how I perceive the future of asynchronous I/O in Python. It's not something that should directly be incorporated into python now, but I believe it's useful for python-ideas list. https://medium.com/@paulcolomiets/the-future-of-asynchronous-io-in-python-ce... And a place for comments at Hacker News: https://news.ycombinator.com/item?id=8662782 I hope being helpful with this writeup :) -- Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
Trent, Can you post source for the regular and pyparallel HTTP servers you used? On Wed, Nov 26, 2014 at 12:56 PM, Trent Nelson <trent@snakebite.org> wrote:
Relevant part of the video with the normal Python stats on the left and PyParallel on the right:
https://www.youtube.com/watch?v=4L4Ww3ROuro#t=838
Transcribed stats:
Regular Python HTTP server:
Thread Stats Avg Stdev Max Latency 4.93ms 714us 10ms Req/Seq 552 154 1.1k 10,480 requests in 10s, 69MB 1048 reqs/sec, 6.9MB/s
PyParallel (4 core Windows VM):
Thread Stats Avg Stdev Max Latency 2.41ms 531us 10ms Req/Seq 1.74k 183 2.33k 32,831 requests in 10s, 216MB 3263 reqs/sec, 21MB/s
So basically a bit less than linear scaling with more cores, which isn't too bad for a full debug build running on a VM.
-----Original Message----- From: Trent Nelson Sent: Wednesday, November 26, 2014 3:36 PM To: 'Paul Colomiets'; python-ideas Subject: RE: [Python-ideas] Asynchronous IO ideas for Python
Have you seen this?:
https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploite...
I spend the first 80-ish slides on async I/O.
(That was a year ago. I've done 2-3 sprints on it since then and have gotten it to a point where I can back up the claims with hard numbers on load testing benchmarks, demonstrated in the most recent video: https://www.youtube.com/watch?v=4L4Ww3ROuro.)
Trent.
-----Original Message----- From: Python-ideas [mailto:python-ideas-bounces+trent= snakebite.org@python.org] On Behalf Of Paul Colomiets Sent: Wednesday, November 26, 2014 12:35 PM To: python-ideas Subject: [Python-ideas] Asynchronous IO ideas for Python
Hi,
I've written an article about how I perceive the future of asynchronous I/O in Python. It's not something that should directly be incorporated into python now, but I believe it's useful for python-ideas list.
https://medium.com/@paulcolomiets/the-future-of-asynchronous-io-in-python-ce...
And a place for comments at Hacker News:
https://news.ycombinator.com/item?id=8662782
I hope being helpful with this writeup :)
-- Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
![](https://secure.gravatar.com/avatar/b9238157e5a9e443647f0ed0f723d20e.jpg?s=120&d=mm&r=g)
Sure can! I dumped the entire contents of my PyParallel source repository (including the full build .exes/dlls, Visual Studio files, etc) to here about an hour before I gave that presentation: https://github.com/pyparallel/release So to replicate that test you'd clone that, then run one of the helpers, python33-http-server.bat or pyparallel-http-server.bat. (There's also http://trent.snakebite.net/pyparallel-0.1-3.3.5.zip,<http://trent.snakebite.net/pyparallel-0.1-3.3.5.zip> but it's 117MB, so... I'd recommend github.com.) The Python 3.3 version is literally `python -m http.server`; the only change I made to the stdlib http/server.py is this: % diff -u cpython.hg/Lib/http/server.py pyparallel-lib-http-server.py --- cpython.hg/Lib/http/server.py 2014-08-14 16:00:29.000000000 -0400 +++ pyparallel-lib-http-server.py 2014-11-26 18:12:58.000000000 -0500 @@ -328,7 +328,7 @@ conntype = self.headers.get('Connection', "") if conntype.lower() == 'close': self.close_connection = 1 - elif (conntype.lower() == 'keep-alive' and + elif (conntype.lower() == 'keep-alive' or self.protocol_version >= "HTTP/1.1"): self.close_connection = 0 # Examine the headers and look for an Expect directive @@ -440,7 +440,7 @@ version and the current date. """ - self.log_request(code) + #self.log_request(code) self.send_response_only(code, message) self.send_header('Server', self.version_string()) self.send_header('Date', self.date_time_string()) The PyParallel version is basically `python_d -m async.http.server`, which is this: https://github.com/pyparallel/release/blob/master/Lib/async/http/server.py#L... I started with the stdlib version and mostly refactored it a bit for personal style reasons, then made it work for PyParallel. The only PyParallel-specific piece is actually this line: https://github.com/pyparallel/release/blob/master/Lib/async/http/server.py#L... return response.transport.sendfile(before, path, None) Everything else is just normal Python, nothing special -- it just conforms to the current constraints of PyParallel. Basically, the HttpServer.data_received() method will be invoked from parallel threads, not the main interpreter thread. To give you an idea how the protocol/transport stuff is wired up, the standalone launcher stuff is at the bottom of that file: ipaddr = socket.gethostbyname(socket.gethostname()) server = async.server(ipaddr, 8080) async.register(transport=server, protocol=HttpServer) async.run() As for why I haven't publicized this stuff until now, to quote myself from that video... "It currently crashes, a lot. I know why it's crashing, I just haven't had the time to fix it yet. But hey, it's super fast until it does crash ;-)" By crash, I mean I'm hitting an assert() in my code -- it happens after the benchmark runs and has to do with the asynchronous socket disconnect logic. I tried fixing it properly before giving that talk, but ran out of time (https://bitbucket.org/tpn/pyparallel/branch/3.3-px-pygotham-2014-sprint). I'll fix all of that the next sprint... which will be... heh, hopefully around Christmas? Oh, actually, the big takeaway from the PyGotham sprint was that I spent an evening re-applying all the wild commit and hackery I'd accumulated to a branch created from the 3.3.5 tag: https://bitbucket.org/tpn/pyparallel/commits/branch/3.3-px. So diff that against the 3.3.5 tag to get an idea of what interpreter changes I needed to make to get to this point. (I have no idea why I didn't pick a tag to work off when I first started -- I literally just started hacking on whatever my local tip was on, which was some indeterminate state between... 3.2 and 3.3?) Side note: I'm really happy with how everything has worked out so far, it is exactly how I envisioned it way back in those python-ideas@ discussions that resulted in tulip/asyncio. I was seeing ridiculously good scaling on my beefier machine at home (8 core, running native) -- to the point where I was maxing out the client machine at about 50,000 requests/sec (~100MB/s) and the PyParallel box was only at about 40% CPU use. Oh, and it appears to be much faster than node.js's http-server too (`npm install http-server`, cd into the website directory, `http-server -s .` to get an equivalent HTTP server from node.js), which I thought was cute. Well, I expected it to be, that's the whole point of being able to exploit all cores and not doing single threaded multiplexing -- so it was good to see that being the case. Node wasn't actually that much faster than Python's normal http.server if I remember correctly. It definitely used less CPU overall than the Python one -- basically what I'm seeing is that Python will be maxing out one core, which should only be 25% CPU (4 core VM), but actual CPU use is up around 50%, and it's mostly kernel time making up the other half. Node will also max out a core, but overall CPU use is ~30%. I attribute this to Python's http.server using select(), whereas I believe node.js ends up using IOCP in a single-threaded event loop. So, you could expect Python asyncio to get similar performance to node, but they're both crushed by PyParallel (until it crashes, heh) as soon as you've got more than one core, which was the point I've been vehemently making from day one ;-) And I just realized I'm writing this e-mail on the same laptop that did that demo, so I can actually back all of this up with a quick run now. Python 3.3 On Windows: C:\Users\Trent\src\pyparallel-0.1-3.3.5 λ python33-http-server.bat Serving HTTP on 0.0.0.0 port 8000 ... On Mac: (trent@raptor:ttys003) (Wed/19:06) .. (~s/wrk) % ./wrk -c 8 -t 2 -d 10 --latency http://192.168.46.131:8000/index.html Running 10s test @ http://192.168.46.131:8000/index.html 2 threads and 8 connections Thread Stats Avg Stdev Max +/- Stdev Latency 6.33ms 1.74ms 18.42ms 75.65% Req/Sec 419.77 119.93 846.00 67.43% Latency Distribution 50% 6.26ms 75% 7.15ms 90% 8.21ms 99% 12.42ms 8100 requests in 10.00s, 53.48MB read Requests/sec: 809.92 Transfer/sec: 5.35MB Node.js On Windows: C:\Users\Trent\src\pyparallel-0.1-3.3.5\website λ http-server -s . On Mac: (trent@raptor:ttys003) (Wed/19:07) .. (~s/wrk) % ./wrk -c 8 -t 2 -d 10 --latency http://192.168.46.131:8080/index.html Running 10s test @ http://192.168.46.131:8080/index.html 2 threads and 8 connections Thread Stats Avg Stdev Max +/- Stdev Latency 6.44ms 2.40ms 19.70ms 84.77% Req/Sec 621.94 124.26 0.94k 68.05% Latency Distribution 50% 5.93ms 75% 7.00ms 90% 8.97ms 99% 16.17ms 12021 requests in 10.00s, 80.84MB read Requests/sec: 1201.98 Transfer/sec: 8.08MB PyParallel On Windows: C:\Users\Trent\src\pyparallel-0.1-3.3.5 λ pyparallel-http-server.bat Serving HTTP on 192.168.46.131 port 8080 ... Traceback (most recent call last): File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\runpy.py", line 160, in _run_module_as_main "__main__", fname, loader, pkg_name) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\runpy.py", line 73, in _run_code exec(code, run_globals) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 518, in <module> cli = run(*args) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 488, in run return CLI(*args, **kwds) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 272, in __init__ self.run() File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 278, in run self._process_commandline() File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 424, in _process_commandline cl.run(args) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 217, in run self.command.start() File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\command.py", line 455, in start self.run() File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\px\commands.py", line 90, in run async.run() OSError: [WinError 8] Not enough storage is available to process this command _PyParallel_Finalize(): px->contexts_active: 462 [92105 refs] _PyParallel_DeletingThreadState(): px->contexts_active: 462 Oh dear :-) Hadn't seen that before. The VM has 4GB allocated to it... I checked taskmgr and it was reporting ~90% physical memory use. Closed a bunch of things and got it down to 54%, then re-ran, that did the trick. Including this info in case anyone else runs into this. Re-run: C:\Users\Trent\src\pyparallel-0.1-3.3.5 λ pyparallel-http-server.bat Serving HTTP on 192.168.46.131 port 8080 ... On Mac: (trent@raptor:ttys003) (Wed/19:16) .. (~s/wrk) % ./wrk -c 8 -t 2 -d 10 --latency http://192.168.46.131:8080/index.html Running 10s test @ http://192.168.46.131:8080/index.html 2 threads and 8 connections Thread Stats Avg Stdev Max +/- Stdev Latency 4.04ms 1.80ms 23.35ms 91.16% Req/Sec 1.07k 191.81 1.54k 75.00% Latency Distribution 50% 3.68ms 75% 4.41ms 90% 5.40ms 99% 13.04ms 20317 requests in 10.00s, 134.22MB read Requests/sec: 2031.33 Transfer/sec: 13.42MB And then back on Windows after the benchmark completes: C:\Users\Trent\src\pyparallel-0.1-3.3.5 λ pyparallel-http-server.bat Serving HTTP on 192.168.46.131 port 8080 ... Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Heh. That's the crashing I was referring to. So basically, it's better in every category (lowest latency, lowest jitter (stddev), highest throughput) for the duration of the benchmark, then crashes :-) (Basically, my DisconnectEx assumptions regarding overlapped sockets, socket resource reuse, I/O completion ports, and thread pools were... not correct apparently.) I remain committed to the assertion that the Windows' kernel approach to asynchronous I/O via I/O completion ports is fundamentally superior to the UNIX/POSIX approach in every aspect if you want to optimally use contemporary multicore hardware (exploit all cores as efficiently as possible). I talk about this in more detail here: https://speakerdeck.com/trent/parallelism-and-concurrency-with-python?slide=.... But good grief, it is orders of magnitude more complex at every level. A less stubborn version of me would have given up waaaay earlier. Glad I stuck with it though, really happy with results so far. Trent. From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf Of Guido van Rossum Sent: Wednesday, November 26, 2014 4:49 PM To: Trent Nelson Cc: Paul Colomiets; python-ideas Subject: Re: [Python-ideas] Asynchronous IO ideas for Python Trent, Can you post source for the regular and pyparallel HTTP servers you used? On Wed, Nov 26, 2014 at 12:56 PM, Trent Nelson <trent@snakebite.org<mailto:trent@snakebite.org>> wrote: Relevant part of the video with the normal Python stats on the left and PyParallel on the right: https://www.youtube.com/watch?v=4L4Ww3ROuro#t=838 Transcribed stats: Regular Python HTTP server: Thread Stats Avg Stdev Max Latency 4.93ms 714us 10ms Req/Seq 552 154 1.1k 10,480 requests in 10s, 69MB 1048 reqs/sec, 6.9MB/s PyParallel (4 core Windows VM): Thread Stats Avg Stdev Max Latency 2.41ms 531us 10ms Req/Seq 1.74k 183 2.33k 32,831 requests in 10s, 216MB 3263 reqs/sec, 21MB/s So basically a bit less than linear scaling with more cores, which isn't too bad for a full debug build running on a VM. -----Original Message----- From: Trent Nelson Sent: Wednesday, November 26, 2014 3:36 PM To: 'Paul Colomiets'; python-ideas Subject: RE: [Python-ideas] Asynchronous IO ideas for Python Have you seen this?: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploite... I spend the first 80-ish slides on async I/O. (That was a year ago. I've done 2-3 sprints on it since then and have gotten it to a point where I can back up the claims with hard numbers on load testing benchmarks, demonstrated in the most recent video: https://www.youtube.com/watch?v=4L4Ww3ROuro.) Trent. -----Original Message----- From: Python-ideas [mailto:python-ideas-bounces+trent<mailto:python-ideas-bounces%2Btrent>=snakebite.org@python.org<mailto:snakebite.org@python.org>] On Behalf Of Paul Colomiets Sent: Wednesday, November 26, 2014 12:35 PM To: python-ideas Subject: [Python-ideas] Asynchronous IO ideas for Python Hi, I've written an article about how I perceive the future of asynchronous I/O in Python. It's not something that should directly be incorporated into python now, but I believe it's useful for python-ideas list. https://medium.com/@paulcolomiets/the-future-of-asynchronous-io-in-python-ce... And a place for comments at Hacker News: https://news.ycombinator.com/item?id=8662782 I hope being helpful with this writeup :) -- Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org<mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ _______________________________________________ Python-ideas mailing list Python-ideas@python.org<mailto:Python-ideas@python.org> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -- --Guido van Rossum (python.org/~guido<http://python.org/~guido>)
![](https://secure.gravatar.com/avatar/e443acc24191004c655608710400e7a1.jpg?s=120&d=mm&r=g)
2014-11-27 1:18 GMT+00:00 Trent Nelson <trent@snakebite.org>:
Everything else is just normal Python, nothing special -- it just conforms to the current constraints of PyParallel. Basically, the HttpServer.data_received() method will be invoked from parallel threads, not the main interpreter thread.
So, still no garbage collection from the threads?
To give you an idea how the protocol/transport stuff is wired up, the standalone launcher stuff is at the bottom of that file:
ipaddr = socket.gethostbyname(socket.gethostname())
server = async.server(ipaddr, 8080)
async.register(transport=server, protocol=HttpServer)
async.run()
As for why I haven't publicized this stuff until now, to quote myself from that video... "It currently crashes, a lot. I know why it's crashing, I just haven't had the time to fix it yet. But hey, it's super fast until it does crash ;-)"
By crash, I mean I'm hitting an assert() in my code -- it happens after the benchmark runs and has to do with the asynchronous socket disconnect logic. I tried fixing it properly before giving that talk, but ran out of time (https://bitbucket.org/tpn/pyparallel/branch/3.3-px-pygotham-2014-sprint).
I'll fix all of that the next sprint... which will be... heh, hopefully around Christmas?
Oh, actually, the big takeaway from the PyGotham sprint was that I spent an evening re-applying all the wild commit and hackery I'd accumulated to a branch created from the 3.3.5 tag: https://bitbucket.org/tpn/pyparallel/commits/branch/3.3-px. So diff that against the 3.3.5 tag to get an idea of what interpreter changes I needed to make to get to this point. (I have no idea why I didn't pick a tag to work off when I first started -- I literally just started hacking on whatever my local tip was on, which was some indeterminate state between... 3.2 and 3.3?)
Side note: I'm really happy with how everything has worked out so far, it is exactly how I envisioned it way back in those python-ideas@ discussions that resulted in tulip/asyncio. I was seeing ridiculously good scaling on my beefier machine at home (8 core, running native) -- to the point where I was maxing out the client machine at about 50,000 requests/sec (~100MB/s) and the PyParallel box was only at about 40% CPU use.
Oh, and it appears to be much faster than node.js's http-server too (`npm install http-server`, cd into the website directory, `http-server -s .` to get an equivalent HTTP server from node.js), which I thought was cute. Well, I expected it to be, that's the whole point of being able to exploit all cores and not doing single threaded multiplexing -- so it was good to see that being the case.
Node wasn't actually that much faster than Python's normal http.server if I remember correctly. It definitely used less CPU overall than the Python one -- basically what I'm seeing is that Python will be maxing out one core, which should only be 25% CPU (4 core VM), but actual CPU use is up around 50%, and it's mostly kernel time making up the other half. Node will also max out a core, but overall CPU use is ~30%. I attribute this to Python's http.server using select(), whereas I believe node.js ends up using IOCP in a single-threaded event loop. So, you could expect Python asyncio to get similar performance to node, but they're both crushed by PyParallel (until it crashes, heh) as soon as you've got more than one core, which was the point I've been vehemently making from day one ;-)
And I just realized I'm writing this e-mail on the same laptop that did that demo, so I can actually back all of this up with a quick run now.
Python 3.3
On Windows:
C:\Users\Trent\src\pyparallel-0.1-3.3.5 λ python33-http-server.bat Serving HTTP on 0.0.0.0 port 8000 ...
On Mac:
(trent@raptor:ttys003) (Wed/19:06) .. (~s/wrk)
% ./wrk -c 8 -t 2 -d 10 --latency http://192.168.46.131:8000/index.html
Running 10s test @ http://192.168.46.131:8000/index.html
2 threads and 8 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 6.33ms 1.74ms 18.42ms 75.65%
Req/Sec 419.77 119.93 846.00 67.43%
Latency Distribution
50% 6.26ms
75% 7.15ms
90% 8.21ms
99% 12.42ms
8100 requests in 10.00s, 53.48MB read
Requests/sec: 809.92
Transfer/sec: 5.35MB
Node.js
On Windows:
C:\Users\Trent\src\pyparallel-0.1-3.3.5\website λ http-server -s .
On Mac:
(trent@raptor:ttys003) (Wed/19:07) .. (~s/wrk)
% ./wrk -c 8 -t 2 -d 10 --latency http://192.168.46.131:8080/index.html
Running 10s test @ http://192.168.46.131:8080/index.html
2 threads and 8 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 6.44ms 2.40ms 19.70ms 84.77%
Req/Sec 621.94 124.26 0.94k 68.05%
Latency Distribution
50% 5.93ms
75% 7.00ms
90% 8.97ms
99% 16.17ms
12021 requests in 10.00s, 80.84MB read
Requests/sec: 1201.98
Transfer/sec: 8.08MB
PyParallel
On Windows:
C:\Users\Trent\src\pyparallel-0.1-3.3.5 λ pyparallel-http-server.bat Serving HTTP on 192.168.46.131 port 8080 ... Traceback (most recent call last): File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\runpy.py", line 160, in _run_module_as_main "__main__", fname, loader, pkg_name) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\runpy.py", line 73, in _run_code exec(code, run_globals) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 518, in <module> cli = run(*args) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 488, in run return CLI(*args, **kwds) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 272, in __init__ self.run() File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 278, in run self._process_commandline() File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 424, in _process_commandline cl.run(args) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 217, in run self.command.start() File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\command.py", line 455, in start self.run() File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\px\commands.py", line 90, in run async.run() OSError: [WinError 8] Not enough storage is available to process this command _PyParallel_Finalize(): px->contexts_active: 462 [92105 refs] _PyParallel_DeletingThreadState(): px->contexts_active: 462
Oh dear :-) Hadn't seen that before. The VM has 4GB allocated to it... I checked taskmgr and it was reporting ~90% physical memory use. Closed a bunch of things and got it down to 54%, then re-ran, that did the trick. Including this info in case anyone else runs into this.
Re-run:
C:\Users\Trent\src\pyparallel-0.1-3.3.5 λ pyparallel-http-server.bat Serving HTTP on 192.168.46.131 port 8080 ...
On Mac:
(trent@raptor:ttys003) (Wed/19:16) .. (~s/wrk)
% ./wrk -c 8 -t 2 -d 10 --latency http://192.168.46.131:8080/index.html
Running 10s test @ http://192.168.46.131:8080/index.html
2 threads and 8 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.04ms 1.80ms 23.35ms 91.16%
Req/Sec 1.07k 191.81 1.54k 75.00%
Latency Distribution
50% 3.68ms
75% 4.41ms
90% 5.40ms
99% 13.04ms
20317 requests in 10.00s, 134.22MB read
Requests/sec: 2031.33
Transfer/sec: 13.42MB
And then back on Windows after the benchmark completes:
C:\Users\Trent\src\pyparallel-0.1-3.3.5 λ pyparallel-http-server.bat Serving HTTP on 192.168.46.131 port 8080 ... Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311
Heh. That's the crashing I was referring to.
So basically, it's better in every category (lowest latency, lowest jitter (stddev), highest throughput) for the duration of the benchmark, then crashes :-)
(Basically, my DisconnectEx assumptions regarding overlapped sockets, socket resource reuse, I/O completion ports, and thread pools were... not correct apparently.)
I remain committed to the assertion that the Windows' kernel approach to asynchronous I/O via I/O completion ports is fundamentally superior to the UNIX/POSIX approach in every aspect if you want to optimally use contemporary multicore hardware (exploit all cores as efficiently as possible). I talk about this in more detail here: https://speakerdeck.com/trent/parallelism-and-concurrency-with-python?slide=....
But good grief, it is orders of magnitude more complex at every level. A less stubborn version of me would have given up waaaay earlier. Glad I stuck with it though, really happy with results so far.
Trent.
From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf Of Guido van Rossum Sent: Wednesday, November 26, 2014 4:49 PM To: Trent Nelson Cc: Paul Colomiets; python-ideas Subject: Re: [Python-ideas] Asynchronous IO ideas for Python
Trent,
Can you post source for the regular and pyparallel HTTP servers you used?
On Wed, Nov 26, 2014 at 12:56 PM, Trent Nelson <trent@snakebite.org> wrote:
Relevant part of the video with the normal Python stats on the left and PyParallel on the right:
https://www.youtube.com/watch?v=4L4Ww3ROuro#t=838
Transcribed stats:
Regular Python HTTP server:
Thread Stats Avg Stdev Max Latency 4.93ms 714us 10ms Req/Seq 552 154 1.1k 10,480 requests in 10s, 69MB 1048 reqs/sec, 6.9MB/s
PyParallel (4 core Windows VM):
Thread Stats Avg Stdev Max Latency 2.41ms 531us 10ms Req/Seq 1.74k 183 2.33k 32,831 requests in 10s, 216MB 3263 reqs/sec, 21MB/s
So basically a bit less than linear scaling with more cores, which isn't too bad for a full debug build running on a VM.
-----Original Message----- From: Trent Nelson Sent: Wednesday, November 26, 2014 3:36 PM To: 'Paul Colomiets'; python-ideas Subject: RE: [Python-ideas] Asynchronous IO ideas for Python
Have you seen this?:
https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploite...
I spend the first 80-ish slides on async I/O.
(That was a year ago. I've done 2-3 sprints on it since then and have gotten it to a point where I can back up the claims with hard numbers on load testing benchmarks, demonstrated in the most recent video: https://www.youtube.com/watch?v=4L4Ww3ROuro.)
Trent.
-----Original Message----- From: Python-ideas [mailto:python-ideas-bounces+trent=snakebite.org@python.org] On Behalf Of Paul Colomiets Sent: Wednesday, November 26, 2014 12:35 PM To: python-ideas Subject: [Python-ideas] Asynchronous IO ideas for Python
Hi,
I've written an article about how I perceive the future of asynchronous I/O in Python. It's not something that should directly be incorporated into python now, but I believe it's useful for python-ideas list.
https://medium.com/@paulcolomiets/the-future-of-asynchronous-io-in-python-ce...
And a place for comments at Hacker News:
https://news.ycombinator.com/item?id=8662782
I hope being helpful with this writeup :)
-- Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
--Guido van Rossum (python.org/~guido)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 27 November 2014 at 03:35, Paul Colomiets <paul@colomiets.name> wrote:
Hi,
I've written an article about how I perceive the future of asynchronous I/O in Python. It's not something that should directly be incorporated into python now, but I believe it's useful for python-ideas list.
https://medium.com/@paulcolomiets/the-future-of-asynchronous-io-in-python-ce...
Thanks Paul, that's an interesting write-up. Another couple of potentially relevant projects in the higher level "service discovery" space (beyond Zookeeper, which you already mention) are Fedora's fedmsg (http://www.fedmsg.com/en/latest/ - since also adopted by Debian I believe) and the Zato ESB project (https://zato.io/). Those are the kinds of things a service discovery plugin system would want to be able to handle. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/ebf132362b622423ed5baca2988911b8.jpg?s=120&d=mm&r=g)
On Nov 26, 2014, at 9:30 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 27 November 2014 at 03:35, Paul Colomiets <paul@colomiets.name> wrote:
Hi,
I've written an article about how I perceive the future of asynchronous I/O in Python. It's not something that should directly be incorporated into python now, but I believe it's useful for python-ideas list.
https://medium.com/@paulcolomiets/the-future-of-asynchronous-io-in-python-ce...
Thanks Paul, that's an interesting write-up.
Another couple of potentially relevant projects in the higher level "service discovery" space (beyond Zookeeper, which you already mention) are Fedora's fedmsg (http://www.fedmsg.com/en/latest/ - since also adopted by Debian I believe) and the Zato ESB project (https://zato.io/). Those are the kinds of things a service discovery plugin system would want to be able to handle.
We’re using consul for service discovery in psf-salt FWIW. It’s been pretty good for us so far. It implements service discovery along with health checks so that if a service starts failing health checks it gets kicked out of the service discovery rotation for that service. It also implements a DNS resolver so you can use DNS to discover instead of it’s API if you want. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
![](https://secure.gravatar.com/avatar/2a9d09b311f11f92cdc6a91b3c6519b1.jpg?s=120&d=mm&r=g)
Paul Colomiets <paul@colomiets.name> wrote:
I've written an article about how I perceive the future of asynchronous I/O in Python. It's not something that should directly be incorporated into python now, but I believe it's useful for python-ideas list.
https://medium.com/@paulcolomiets/the-future-of-asynchronous-io-in-python-ce...
This approximately how asynchronous I/O is implemented on Windows (IOCP) and Mac and FreeBSD (GCD): You have a thread-pool that operates independent of the main program thread, and then you can enqueue I/O operations as work tasks to the thread pool. You suggest to have a singleton "I/O kernel" thread in Python, but it is actually not that different from having a pool of worker threads. In a very minimalistic way, one could implement something similar to the set of I/O functions present in GCD. This API is already designed to have the smallest possible complexity, and yet it is as powerful as Windows' IOCP for most practical purposes. https://developer.apple.com/library/mac/documentation/Performance/Reference/... On Windows it could run on top of IOCP, on Mac and FreeBSD it could run on top of GCD and thus use kqueue under the hood. Linux would actually be the hardest platform, but IOCPs have been implemented with epoll and a threadpool in Wine. Sturla
participants (7)
-
Charles-François Natali
-
Donald Stufft
-
Guido van Rossum
-
Nick Coghlan
-
Paul Colomiets
-
Sturla Molden
-
Trent Nelson