![](https://secure.gravatar.com/avatar/e443acc24191004c655608710400e7a1.jpg?s=120&d=mm&r=g)
2014-11-27 1:18 GMT+00:00 Trent Nelson <trent@snakebite.org>:
Everything else is just normal Python, nothing special -- it just conforms to the current constraints of PyParallel. Basically, the HttpServer.data_received() method will be invoked from parallel threads, not the main interpreter thread.
So, still no garbage collection from the threads?
To give you an idea how the protocol/transport stuff is wired up, the standalone launcher stuff is at the bottom of that file:
ipaddr = socket.gethostbyname(socket.gethostname())
server = async.server(ipaddr, 8080)
async.register(transport=server, protocol=HttpServer)
async.run()
As for why I haven't publicized this stuff until now, to quote myself from that video... "It currently crashes, a lot. I know why it's crashing, I just haven't had the time to fix it yet. But hey, it's super fast until it does crash ;-)"
By crash, I mean I'm hitting an assert() in my code -- it happens after the benchmark runs and has to do with the asynchronous socket disconnect logic. I tried fixing it properly before giving that talk, but ran out of time (https://bitbucket.org/tpn/pyparallel/branch/3.3-px-pygotham-2014-sprint).
I'll fix all of that the next sprint... which will be... heh, hopefully around Christmas?
Oh, actually, the big takeaway from the PyGotham sprint was that I spent an evening re-applying all the wild commit and hackery I'd accumulated to a branch created from the 3.3.5 tag: https://bitbucket.org/tpn/pyparallel/commits/branch/3.3-px. So diff that against the 3.3.5 tag to get an idea of what interpreter changes I needed to make to get to this point. (I have no idea why I didn't pick a tag to work off when I first started -- I literally just started hacking on whatever my local tip was on, which was some indeterminate state between... 3.2 and 3.3?)
Side note: I'm really happy with how everything has worked out so far, it is exactly how I envisioned it way back in those python-ideas@ discussions that resulted in tulip/asyncio. I was seeing ridiculously good scaling on my beefier machine at home (8 core, running native) -- to the point where I was maxing out the client machine at about 50,000 requests/sec (~100MB/s) and the PyParallel box was only at about 40% CPU use.
Oh, and it appears to be much faster than node.js's http-server too (`npm install http-server`, cd into the website directory, `http-server -s .` to get an equivalent HTTP server from node.js), which I thought was cute. Well, I expected it to be, that's the whole point of being able to exploit all cores and not doing single threaded multiplexing -- so it was good to see that being the case.
Node wasn't actually that much faster than Python's normal http.server if I remember correctly. It definitely used less CPU overall than the Python one -- basically what I'm seeing is that Python will be maxing out one core, which should only be 25% CPU (4 core VM), but actual CPU use is up around 50%, and it's mostly kernel time making up the other half. Node will also max out a core, but overall CPU use is ~30%. I attribute this to Python's http.server using select(), whereas I believe node.js ends up using IOCP in a single-threaded event loop. So, you could expect Python asyncio to get similar performance to node, but they're both crushed by PyParallel (until it crashes, heh) as soon as you've got more than one core, which was the point I've been vehemently making from day one ;-)
And I just realized I'm writing this e-mail on the same laptop that did that demo, so I can actually back all of this up with a quick run now.
Python 3.3
On Windows:
C:\Users\Trent\src\pyparallel-0.1-3.3.5 λ python33-http-server.bat Serving HTTP on 0.0.0.0 port 8000 ...
On Mac:
(trent@raptor:ttys003) (Wed/19:06) .. (~s/wrk)
% ./wrk -c 8 -t 2 -d 10 --latency http://192.168.46.131:8000/index.html
Running 10s test @ http://192.168.46.131:8000/index.html
2 threads and 8 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 6.33ms 1.74ms 18.42ms 75.65%
Req/Sec 419.77 119.93 846.00 67.43%
Latency Distribution
50% 6.26ms
75% 7.15ms
90% 8.21ms
99% 12.42ms
8100 requests in 10.00s, 53.48MB read
Requests/sec: 809.92
Transfer/sec: 5.35MB
Node.js
On Windows:
C:\Users\Trent\src\pyparallel-0.1-3.3.5\website λ http-server -s .
On Mac:
(trent@raptor:ttys003) (Wed/19:07) .. (~s/wrk)
% ./wrk -c 8 -t 2 -d 10 --latency http://192.168.46.131:8080/index.html
Running 10s test @ http://192.168.46.131:8080/index.html
2 threads and 8 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 6.44ms 2.40ms 19.70ms 84.77%
Req/Sec 621.94 124.26 0.94k 68.05%
Latency Distribution
50% 5.93ms
75% 7.00ms
90% 8.97ms
99% 16.17ms
12021 requests in 10.00s, 80.84MB read
Requests/sec: 1201.98
Transfer/sec: 8.08MB
PyParallel
On Windows:
C:\Users\Trent\src\pyparallel-0.1-3.3.5 λ pyparallel-http-server.bat Serving HTTP on 192.168.46.131 port 8080 ... Traceback (most recent call last): File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\runpy.py", line 160, in _run_module_as_main "__main__", fname, loader, pkg_name) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\runpy.py", line 73, in _run_code exec(code, run_globals) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 518, in <module> cli = run(*args) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 488, in run return CLI(*args, **kwds) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 272, in __init__ self.run() File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 278, in run self._process_commandline() File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 424, in _process_commandline cl.run(args) File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\cli.py", line 217, in run self.command.start() File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\ctk\command.py", line 455, in start self.run() File "C:\Users\Trent\src\pyparallel-0.1-3.3.5\Lib\px\commands.py", line 90, in run async.run() OSError: [WinError 8] Not enough storage is available to process this command _PyParallel_Finalize(): px->contexts_active: 462 [92105 refs] _PyParallel_DeletingThreadState(): px->contexts_active: 462
Oh dear :-) Hadn't seen that before. The VM has 4GB allocated to it... I checked taskmgr and it was reporting ~90% physical memory use. Closed a bunch of things and got it down to 54%, then re-ran, that did the trick. Including this info in case anyone else runs into this.
Re-run:
C:\Users\Trent\src\pyparallel-0.1-3.3.5 λ pyparallel-http-server.bat Serving HTTP on 192.168.46.131 port 8080 ...
On Mac:
(trent@raptor:ttys003) (Wed/19:16) .. (~s/wrk)
% ./wrk -c 8 -t 2 -d 10 --latency http://192.168.46.131:8080/index.html
Running 10s test @ http://192.168.46.131:8080/index.html
2 threads and 8 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 4.04ms 1.80ms 23.35ms 91.16%
Req/Sec 1.07k 191.81 1.54k 75.00%
Latency Distribution
50% 3.68ms
75% 4.41ms
90% 5.40ms
99% 13.04ms
20317 requests in 10.00s, 134.22MB read
Requests/sec: 2031.33
Transfer/sec: 13.42MB
And then back on Windows after the benchmark completes:
C:\Users\Trent\src\pyparallel-0.1-3.3.5 λ pyparallel-http-server.bat Serving HTTP on 192.168.46.131 port 8080 ... Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311 Assertion failed: s->io_op == PxSocket_IO_SEND, file ..\Python\pyparallel.c, line 6311
Heh. That's the crashing I was referring to.
So basically, it's better in every category (lowest latency, lowest jitter (stddev), highest throughput) for the duration of the benchmark, then crashes :-)
(Basically, my DisconnectEx assumptions regarding overlapped sockets, socket resource reuse, I/O completion ports, and thread pools were... not correct apparently.)
I remain committed to the assertion that the Windows' kernel approach to asynchronous I/O via I/O completion ports is fundamentally superior to the UNIX/POSIX approach in every aspect if you want to optimally use contemporary multicore hardware (exploit all cores as efficiently as possible). I talk about this in more detail here: https://speakerdeck.com/trent/parallelism-and-concurrency-with-python?slide=....
But good grief, it is orders of magnitude more complex at every level. A less stubborn version of me would have given up waaaay earlier. Glad I stuck with it though, really happy with results so far.
Trent.
From: gvanrossum@gmail.com [mailto:gvanrossum@gmail.com] On Behalf Of Guido van Rossum Sent: Wednesday, November 26, 2014 4:49 PM To: Trent Nelson Cc: Paul Colomiets; python-ideas Subject: Re: [Python-ideas] Asynchronous IO ideas for Python
Trent,
Can you post source for the regular and pyparallel HTTP servers you used?
On Wed, Nov 26, 2014 at 12:56 PM, Trent Nelson <trent@snakebite.org> wrote:
Relevant part of the video with the normal Python stats on the left and PyParallel on the right:
https://www.youtube.com/watch?v=4L4Ww3ROuro#t=838
Transcribed stats:
Regular Python HTTP server:
Thread Stats Avg Stdev Max Latency 4.93ms 714us 10ms Req/Seq 552 154 1.1k 10,480 requests in 10s, 69MB 1048 reqs/sec, 6.9MB/s
PyParallel (4 core Windows VM):
Thread Stats Avg Stdev Max Latency 2.41ms 531us 10ms Req/Seq 1.74k 183 2.33k 32,831 requests in 10s, 216MB 3263 reqs/sec, 21MB/s
So basically a bit less than linear scaling with more cores, which isn't too bad for a full debug build running on a VM.
-----Original Message----- From: Trent Nelson Sent: Wednesday, November 26, 2014 3:36 PM To: 'Paul Colomiets'; python-ideas Subject: RE: [Python-ideas] Asynchronous IO ideas for Python
Have you seen this?:
https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploite...
I spend the first 80-ish slides on async I/O.
(That was a year ago. I've done 2-3 sprints on it since then and have gotten it to a point where I can back up the claims with hard numbers on load testing benchmarks, demonstrated in the most recent video: https://www.youtube.com/watch?v=4L4Ww3ROuro.)
Trent.
-----Original Message----- From: Python-ideas [mailto:python-ideas-bounces+trent=snakebite.org@python.org] On Behalf Of Paul Colomiets Sent: Wednesday, November 26, 2014 12:35 PM To: python-ideas Subject: [Python-ideas] Asynchronous IO ideas for Python
Hi,
I've written an article about how I perceive the future of asynchronous I/O in Python. It's not something that should directly be incorporated into python now, but I believe it's useful for python-ideas list.
https://medium.com/@paulcolomiets/the-future-of-asynchronous-io-in-python-ce...
And a place for comments at Hacker News:
https://news.ycombinator.com/item?id=8662782
I hope being helpful with this writeup :)
-- Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
--Guido van Rossum (python.org/~guido)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/