Re: [Python-ideas] Concurrency Modules

Hi. I am back. First of all thanks for your eager participation. I would like to catch on on Steve's and Mark's examples as they seem to be very good illustrations of what issue I still have. Steve explained why asyncio is great and Mark explained why threading+multiprocessing is great. Each from his own perspective and focusing on the internal implementation details. To me, all approaches can now be fit into this sort of table. Please, correct me if it's wrong (that is very important): # | code lives in | managed by --+---------------+------------- 1 | processes | os scheduler 2 | threads | os scheduler 3 | tasks | event loop But the original question still stands: Which one to use? Ignoring little details like 'shared state', 'custom prioritization', etc., they all look the same to me and to what it all comes down are these little nasty details people try to explain so eagerly. Not saying that is a bad thing but it has some implications on production code I do not like and in the following I am going to explain that. Say, we have decided for approach N because of some requirements (examples from here and there, guidelines given by smart people, customer needs etc.) and wrote hundred thousand lines of code. What if these requirements change 6 years in the future? What if the maintainer of approach N decided to change it in such a way that is not compatible with our requirements anymore? From what I can see there is no easy way 'back' to use another approach. They all have different APIs, basically for: 'executing a function and returning its precious result (the cake)'. asyncio gives us the flexibility to choose a prioritization mechanism. Nice to have, because we are now independent on the os scheduler. But do we really ever need that? What is wrong with the os scheduler? Would that not mean that Mark better switches to asyncio? We don't know if we ever would need that in project A and project B. What now? Use asyncio just in case? Preemptively? @Steve Thanks for that great explanation of how asyncio works and its relationship to threads/processes. But I still have a question: why can't we use threads for the cakes? (1 cake = 1 thread). Not saying that asyncio would be a bad idea to use here, but couldn't we accomplish the same functionality by using threads? I think, after we've settled the above questions, we should change the focus from How do they work internally and what are the tiny differences? (answered greatly by Mark) to When do I use which one? The latter question actually is what counts for production code. It actually is quite interesting to know and to ponder over all the differences, dependencies, corner cases etc. However, when it actually comes down to 'executing a piece of code and returning its result', you end up deciding which approach to choose. You won't implement all 3 different ways just because it is great to see all the nasty little details to click in. On Thursday, July 9, 2015 at 11:54:11 PM UTC+1, Sven R. Kunze wrote:

"Sven R. Kunze" <srkunze@mail.de> wrote:
In CPython threads are actually managed by a combination of the OS scheduler and the interpreter (which controls the GIL). Processes on the other hand are only managed by the scheduler. Then there is the address space, which is shared for threads and tasks and private for processes. 1 | processes | os scheduler 2 | threads | os scheduler and python interpreter 3 | tasks | event loop
Then you are screwed, which is a PITA for all concurrency code, not just the one written in Python. Sturla

"But I still have a question: why can't we use threads for the cakes? (1 cake = 1 thread)." Because that is the wrong equality - it's really 1 baker = 1 thread. Bakers aren't free, you have to pay for each one (memory, stack space), it will take time for each one to learn how your bakery works (startup time), and you will waste some of your own time coordinating them (interthread communication). You also only have one set of baking equipment (the GIL), buying another bakery is expensive (another process) and fitting more equipment into the current one is very complicated (subinterpreters). So you either pay a high price for 2 bakers = 2 cakes, or you accept 2 bakers = 1.5 cakes (in the same amount of time). It turns out that often 1 baker can do 1.5 cakes in the same time as well, and it's much easier to reason about and implement correctly. Hope that makes sense and I'm not stretching things too far. Guess I should make this into a talk for PyCon next year. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Sven R. Kunze<mailto:srkunze@mail.de> Sent: 7/24/2015 14:41 To: Mark Summerfield<mailto:m.n.summerfield@googlemail.com>; python-ideas@googlegroups.com<mailto:python-ideas@googlegroups.com>; python-ideas@python.org<mailto:python-ideas@python.org>; Steve Dower<mailto:Steve.Dower@microsoft.com> Subject: Re: [Python-ideas] Concurrency Modules Hi. I am back. First of all thanks for your eager participation. I would like to catch on on Steve's and Mark's examples as they seem to be very good illustrations of what issue I still have. Steve explained why asyncio is great and Mark explained why threading+multiprocessing is great. Each from his own perspective and focusing on the internal implementation details. To me, all approaches can now be fit into this sort of table. Please, correct me if it's wrong (that is very important): # | code lives in | managed by --+---------------+------------- 1 | processes | os scheduler 2 | threads | os scheduler 3 | tasks | event loop But the original question still stands: Which one to use? Ignoring little details like 'shared state', 'custom prioritization', etc., they all look the same to me and to what it all comes down are these little nasty details people try to explain so eagerly. Not saying that is a bad thing but it has some implications on production code I do not like and in the following I am going to explain that. Say, we have decided for approach N because of some requirements (examples from here and there, guidelines given by smart people, customer needs etc.) and wrote hundred thousand lines of code. What if these requirements change 6 years in the future? What if the maintainer of approach N decided to change it in such a way that is not compatible with our requirements anymore? From what I can see there is no easy way 'back' to use another approach. They all have different APIs, basically for: 'executing a function and returning its precious result (the cake)'. asyncio gives us the flexibility to choose a prioritization mechanism. Nice to have, because we are now independent on the os scheduler. But do we really ever need that? What is wrong with the os scheduler? Would that not mean that Mark better switches to asyncio? We don't know if we ever would need that in project A and project B. What now? Use asyncio just in case? Preemptively? @Steve Thanks for that great explanation of how asyncio works and its relationship to threads/processes. But I still have a question: why can't we use threads for the cakes? (1 cake = 1 thread). Not saying that asyncio would be a bad idea to use here, but couldn't we accomplish the same functionality by using threads? I think, after we've settled the above questions, we should change the focus from How do they work internally and what are the tiny differences? (answered greatly by Mark) to When do I use which one? The latter question actually is what counts for production code. It actually is quite interesting to know and to ponder over all the differences, dependencies, corner cases etc. However, when it actually comes down to 'executing a piece of code and returning its result', you end up deciding which approach to choose. You won't implement all 3 different ways just because it is great to see all the nasty little details to click in. On Thursday, July 9, 2015 at 11:54:11 PM UTC+1, Sven R. Kunze wrote:

On Sat, Jul 25, 2015 at 3:28 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:
Hope that makes sense and I'm not stretching things too far. Guess I should make this into a talk for PyCon next year.
Yes. And serve cake. On a more serious note, I'd like to see some throughput tests for process-pool, thread-pool, and asyncio on a single thread. That'd make a great PyCon talk; make sure it's videoed, as I'd likely be linking to it a lot. ChrisA

On 25 July 2015 at 15:32, Chris Angelico <rosuav@gmail.com> wrote:
Dave Beazley's "Python Concurrency from the Ground Up" talk at PyCon US this year was almost exactly that: https://us.pycon.org/2015/schedule/presentation/374/ Video: https://www.youtube.com/watch?v=MCs5OvhV9S4 Demo code: https://github.com/dabeaz/concurrencylive There's a direct causal link between that talk and our renewed interest in getting subinterpreters up to a point where they can offer most of the low overhead of interpreter threads with most of the memory safety of operating system level processes :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nice, that really clears it up for me. So, let's summarize what we have so far: | 1 | 2 | 3 ---------------+-------------------------+----------------------------+------------------------ code lives in | processes | threads | coroutines managed by | os scheduler | os scheduler + interpreter | customizable event loop | | | parallelism | yes | depends (cf. GIL) | no shared state | no | yes | yes | | | startup impact | biggest | medium | smallest cpu impact | biggest | medium | smallest memory impact | biggest | medium | smallest | | | purpose | cpu-bound tasks | i/o-bound tasks | ??? | | | module pool | multiprocessing.Pool | multiprocessing.dummy.Pool | ??? module solo | multiprocessing.Process | threading.Thread | ??? Please, feel free to amend/correct the table and fill in the ??? parts if you know better. On 25.07.2015 07:28, Steve Dower wrote:

On Jul 25 2015, "Sven R. Kunze" <srkunze-7y4VAllY4QU@public.gmane.org> wrote:
I don't think any of these is correct. Unfortunately, I also don't think there even is a correct version, the differences are simply not so clear-cut. On Unix, Process startup-cost can be high if you do fork() + exec(), but if you just fork, it's as cheap as a thread. With asyncio, it's not clear to me what exactly you'd define as the "startup impact" (the creation of a future maybe? Or setting up the event loop?). "CPU impact" as a category doesn't make any sense to me. If you execute the same code it's going to take the same amount of (cumulative) CPU time, no matter if this code runs in a separate thread, separate process, or asynchronously. "memory impact" is probably highest for separate processes, but I don't see an obvious difference when using threads vs asyncio. Where did you get this from? As far as purpose is concerned, pretty much the only limitation is that asyncio is not suitable for cpu-bound tasks. Any other combination is possible and also most appropriate in specific circumstances. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Thanks, Nikolaus. Mostly I refer to things Steve brought up in his analogies (two recent posts). So, I might interpreted them the wrong way. On 26.07.2015 02:58, Nikolaus Rath wrote: them. the approaches. >> What's necessary to get a process up and running a piece of code compared to what's necessary to get asyncio up and running the same piece of code. Steve: "Bakers aren't free, you have to pay for each one (memory, stack space), it will take time for each one to learn how your bakery works (startup time)"
I take from this that asyncio is suitable for heavy i/o-bound, threads are for cpu/io-bound and processes for mainly cpu-bound. Best, Sven

On Jul 26, 2015, at 12:07, Sven R. Kunze <srkunze@mail.de> wrote:
One huge thing you're missing is cooperative vs. preemptive switching. In asyncio, you know that no other task is going to run until you reach the next explicit yield point; with threads, it can happen after any bytecode; with processes, it can happen anywhere at all. This means that if you're using shared state, your locking strategy can be simpler, more efficient, and easier to prove correct with asyncio. And likewise, if you need to sequence things, it can be easier with asyncio (although often the simplest way to do that in any mechanism is to make each of those things into a task and just chain futures together).
It's your choice: just fork, spawn (fork+exec), or spawn a special "server" process to fork copies off. (Except on Windows, where spawn is the only possibility.) How do you know which one to choose? Well, you have to learn the differences to make a decision. Forking is fastest, and it means some kinds of globals are automatically shared, but it can lead to a variety of problems, especially if you're also using threads (and some libraries may use threads without you knowing about it--especially on OS X, where a variety of Cocoa APIs sometimes use threads and sometimes don't).
Yes. There's always a context switch going on, but a cooperative context switch can swap a lot less, and can do it without having to cross the user-kernel boundary.
The overhead for the contexts themselves is tiny--but one of the things each thread context points at is the stack, and that may be 1MB or even more. So, a program with 500 threads may be using half a GB just for stacks. That may not be as bad as it sounds, because if you never use most of the stack, most of it may never actually get paged to physical memory. (But on 32-bit OS's, you're still using up a quarter of your page table space.)
Asyncio is best for massively concurrent i/o bound code that does pretty much the same thing for each one, like a web server that has to handle thousands of users. Threads are also used for i/o bound code; it's more a matter of how you want to write the code than of what it does. Processes, on the other hand, are the only way (other than a C extension that releases the GIL--or, of course, using a different Python interpreter) to get CPU parallelism. So, that part is right. But there are other advantages of using processes sometimes--it guarantees no accidental shared state; it gives you a way to "recycle" your workers if you might call some C library that can crash or leak memory or corrupt things; it gives you another VM space (which can be a big deal in 32-bit platforms). Also, you can write multiprocessing code as if you were writing distributed code, which makes it easier to turn into real distributed code if you later need to do that.

Wow. Thanks, Andrew for this very informative response. I am going to integrate your thoughts in to the table later and re-post it again. Just one question: On 26.07.2015 12:29, Andrew Barnert wrote:
It's your choice: just fork, spawn (fork+exec), or spawn a special "server" process to fork copies off. (Except on Windows, where spawn is the only possibility.)
How do you know which one to choose? Well, you have to learn the differences to make a decision. Forking is fastest, and it means some kinds of globals are automatically shared, but it can lead to a variety of problems, especially if you're also using threads (and some libraries may use threads without you knowing about it--especially on OS X, where a variety of Cocoa APIs sometimes use threads and sometimes don't).
If I read the documentation of https://docs.python.org/2/library/multiprocessing.html#module-multiprocessin... for instance, I do not see a way to specify my choice. There, I pass a function and this function is executed in another process/thread. Is that just forking?

On 26 July 2015 at 21:44, Sven R. Kunze <srkunze@mail.de> wrote:
The Python 2.7 multiprocessing module API is ~5 years old at this point, Andrew's referring to the API in Python 3.4+: https://docs.python.org/2/library/multiprocessing.html#module-multiprocessin... As far as the other benefits of asyncio go, one of the perks is that you can stop all processing smoothly just by stopping the event loop, and then they'll all resume together later. This gives you a *lot* more predictability than using threads or processes, which genuinely execute in parallel. After the previous discussion, I wrote http://www.curiousefficiency.org/posts/2015/07/asyncio-tcp-echo-server.html to attempt to convey some of the *practical* benefits of using asyncio to manage interleaved network operations within a single thread. While in the blog post I'm just playing with TCP clients and echo servers at the interactive prompt, it wouldn't be too hard to adapt those techniques to running network client and server testing code as part of a synchronous test suite. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 27 July 2015 at 00:28, Nick Coghlan <ncoghlan@gmail.com> wrote:
It would help if I actually replaced the link with the one I intended to provide...: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-me... Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Jul 26, 2015, at 13:44, Sven R. Kunze <srkunze@mail.de> wrote:
That's because you're reading the documentation for Python 2.7. In 2.7, you always get fork on Unix and spawn on Windows; the choice of start methods was added in 3.3 or 3.4.
There, I pass a function and this function is executed in another process/thread. Is that just forking?
If you pass a function to a Process in 2.7, on Unix, that's just forking; the parent process returns while the child process calls your function and exits. If you pass it to a Pool, all the pool processes are forked, but they keep running and pick new tasks off a queue. On Windows, on the other hand, a new Process calls CreateNewProcess (the equivalent of fork then exec, or posix_spawn, on Unix) to launch an entirely new Python interpreter, which then imports your module and calls your function. With a Pool, all the new processes get started the same way, then keep running and pick new tasks off a queue.

Big thanks to you, Andrew, Nick and Nikolaus for the latest comments and ideas. I think the table is in a very good shape now and the questions I started this thread with are now answered (at least) to my satisfaction. The relationships are clear (they are all different modules for the same overall purpose), they have different fields of application (cpu vs io) and they have slightly different properties. How do we proceed from here? Btw. the number of different approaches (currently 3, but I assume this will go up in the future) is quite unfortunate. What's even more unfortunate is the missing exchangeability due to API differences and a common syntax for executing functions concurrently. Something that struck me as odd was that asyncio got syntactic sugar although the module itself is actually quite young compared to the support of processes and of threads. These two alternatives have actually no a single bit of syntax support until now. On 26.07.2015 17:00, Andrew Barnert wrote:

On Jul 26, 2015, at 23:54, Sven R. Kunze <srkunze@mail.de> wrote:
It may go up to four with subinterpreters or something like PyParallel, but I can't see much reason for it to go beyond that in the foreseeable future. In theory, there are two possible things missing here: preemptive, non-GIL-restricted, CPU-parallel switching, with implicit shared data (like threads in, say, Java), and the same without implicit shared data but still with efficient explicit shared data (like Erlang processes). But I don't think the former will ever happen in CPython, and in other interpreters it will just use the same API that threads do today (as is already true for Jython).
What's even more unfortunate is the missing exchangeability due to API differences and a common syntax for executing functions concurrently.
But you don't really need any social syntax. Submitting a function to an executor and getting back a future is only tricky in languages like Java because they don't have first-class functions. In Python
Something that struck me as odd was that asyncio got syntactic sugar although the module itself is actually quite young compared to the support of processes and of threads. These two alternatives have actually no a single bit of syntax support until now.
The other two don't need that syntactic support. The point of the await keyword is to mark explicit switch points (yield from also does that, but it's also used in traditional generators, which can be confusing), while async is to mark functions that need to be awaited (yield or yield from also does that, but again, that can be confusing--plus, sometimes you need to make a function awaitable even though it doesn't await anything, which in 3.4 required either a meaningless yield or a special decorator). The fact that coroutines and generators are the same thing under the covers is a very nifty feature for interpreter implementors and maybe library implementors, but end users who just want to write coroutines shouldn't have to understand that. (This was obvious to Greg Ewing when he proposed cofunctions a few years ago, but it looks like nobody else really got it until people had experience using asyncio.) Since threads and processes both do implicit switching, they have no use for anything similar. Every expression may switch, not just await expressions, and every function may get switched out, not just async functions. One way to look at it is that the syntactic supports makes asyncio look almost as nice as threads--as nice as it can given that switches have to be explicit. (You can always use a third-party greenlet based library like gevent to give you implicit but still cooperative switching, which looks just like threads--although that can be misleading because it doesn't act just like threads.)

On 27 July 2015 at 07:54, Sven R. Kunze <srkunze@mail.de> wrote:
Their shared abstraction layer is the concurrent.futures module: https://docs.python.org/3/library/concurrent.futures.html (available for Python 2 as the "futures" module on PyPI) For "call and response" use cases involving pools of worker threads or processes, concurrent.futures is a better option than hand rolling our own pool management and request dispatch and response processing code. That model is integrated into the asyncio event loop to support dispatching blocking tasks to a background thread or process. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 25 July 2015 at 18:37, Sven R. Kunze <srkunze@mail.de> wrote:
Nice, that really clears it up for me. So, let's summarize what we have so far:
Just as a note - even given the various provisos and "it's not that simple" comments that have been made, I found this table extremely useful. Like any such high-level summary, I expect to have to take it with a pinch of salt, but I don't see that as an issue - anyone who doesn't fully appreciate that there are subtleties, probably wouldn't read a longer explanation anyway. So many thanks for taking the time to put this together (and for continuing to improve it). +1 on something like this ending up in the Python docs somewhere. Paul

Next update: Improving Performance by Running Independent Tasks Concurrently - A Survey | processes | threads | coroutines ---------------+-------------------------+----------------------------+------------------------- purpose | cpu-bound tasks | cpu- & i/o-bound tasks | i/o-bound tasks | | | managed by | os scheduler | os scheduler + interpreter | customizable event loop controllable | no | no | yes | | | parallelism | yes | depends (cf. GIL) | no switching | at any time | after any bytecode | at user-defined points shared state | no | yes | yes | | | startup impact | biggest/medium* | medium | smallest cpu impact** | biggest | medium | smallest memory impact | biggest | medium | smallest | | | pool module | multiprocessing.Pool | multiprocessing.dummy.Pool | asyncio.BaseEventLoop solo module | multiprocessing.Process | threading.Thread | --- * biggest - if spawn (fork+exec) and always on Windows medium - if fork alone ** due to context switching On 26.07.2015 14:18, Paul Moore wrote:

Hello, This discussion is pretty interesting to try to list when each architecture is the most efficient, based on the need. However, just a small precision: multiprocess/multiworker isn't antinomic with AsyncIO: You can have an event loop in each process to try to combine the "best" of two "worlds". As usual in IT, it isn't a silver bullet that will care the cancer, however, at least to my understanding, it should be useful for some business needs like server daemons. It isn't a crazy new idea, this design pattern is implemented since a long time ago at least in Nginx: http://www.aosabook.org/en/nginx.html If you are interested in to use this design pattern to build a HTTP server only, you can use easily aiohttp.web+gunicorn: http://aiohttp.readthedocs.org/en/stable/gunicorn.html If you want to use any AsyncIO server protocol (aiohttp.web, panoramisk, asyncssh, irc3d), you can use API-Hour: http://www.api-hour.io And if you want to implement by yourself this design pattern, be my guest, if a Python peon like me has implemented API-Hour, everybody on this mailing-list can do that. For communication between workers, I use Redis, however, you have plenty of solutions to do that. As usual, before to select a communication mechanism you should benchmark based on your use cases: some results should surprise you. Have a nice week. PS: Thank you everybody for EuroPython, it was amazing ;-) -- Ludovic Gasc (GMLudo) http://www.gmludo.eu/ 2015-07-26 23:26 GMT+02:00 Sven R. Kunze <srkunze@mail.de>:

Thanks Ludovic. On 28.07.2015 22:15, Ludovic Gasc wrote:
I think that should be clear for everybody using any of these modules. But you are right to point it out explicitly.
I hope not to disappoint you. I actually strive not to do that manually for each tiny bit of program (assuming there are many place in the code base where a project could benefit from concurrency). Personally, I use benchmarks for optimizing problematic code. But if Python would be able to do that without choosing the right and correctly configured approach (to be determined by benchmarks) that would be awesome. As usual, that needs time to evolve. I found that benchmark resulted improvements do not last forever, unfortunately, and that most of the time nobody is able to keep track of everything. So, as soon as something changes, you need to start anew. That is not acceptable for me. Btw. that is also a reason why a I said recently (another topic on this list), 'if Python could optimize that without my attention that would be great'. The simplest solution and therefore the easiest to comprehend for all team members is the way to go. If that is not efficient enough that is actually a Python issue. Readability counts most. And fortunately, most of the cases that attitude works perfectly with Python. :)

2015-07-29 8:29 GMT+02:00 Sven R. Kunze <srkunze@mail.de>:
Based on my discussions at EuroPython and PyCON-US, it's certainly clear for the middle-class management of Python community, however, not really from the typical Python end-dev: Several persons tried to troll me that multiprocessing is more efficient than AsyncIO. To me, it was a opportunity to transform the negative troll attempt to a positive exchange about efficiency and understand before to troll ;-) More seriously, I've the feeling that it isn't very clear for everybody, especially for the new comers.
Don't worry for that, don't hesitate to "hit", I have a very strong shield to avoid disappointments ;-)
I actually strive not to do that manually for each tiny bit of program
You're right, micro-benchmarks isn't a good approach to decide macro architecture of application.
(assuming there are many place in the code base where a project could benefit from concurrency).
As usual, depends on your architecture/need. If you do a lot of network than CPU usage, the waiting time of network should play for more concurrency.
It should technically possible, however, I don't believe too much in implicit hidden optimizations to the end-dev: It's very complicated to hide the magic, few people have the skills to implement that, and the day you have an issue, you're almost alone. See PyPy: certainly one day they will provide a good solution for that, however, it isn't trivial to implement, see the time they need. With the time, I believe more and more to educate developers who help them to understand the big picture and use explicitly optimizations: The learning curve is more important, however, at the end, you have more autonomous developers who will resolve more problems and less afraid to break the standard frame to innovate. I don't have scientific proof of that, it's only a feeling. However, again both approaches aren't antinomic: Each time we have an automagic optimization like computed gotos without side effects, I will use that. I found that benchmark resulted improvements do not last forever,
I'm fully agree with you: Until it works, don't break for the pleasure. Moreover, instead of to trash your full stack for efficiency reasons (For example, drop all your Python code to migrate to Go) where you need to relearn everything, you should maybe first find a solution in your actual stack. At least to me, it was very less complicated to migrate to Python 3, multiworker pattern and AsyncIO than to migrate to Go/NodeJS/Erlang/... Moreover, with a niche language, it's more complicated to find developers and harder to spot impostors: Some people use alternative languages not really used only to try to convince others who are good developers. Another solution is also to add more servers to handle load, but it isn't always the solution with the smallest TCO, don't forget to count sysadmin costs+complexity to debug when you have an issue on your production.
Again, I'm strongly agree with you, however, with the age of Python and the big size of performance community we have (PyPy, Numba, Cython, Pyston...) I believe that less and less automagic solutions without side effects will be find. Not impossible, but harder and harder (I secretly hope that somebody will prove me I was wrong ;-) ) Maybe to "steal" some optimizations from others languages ? I don't have the technical level to help for that, I'm more a business logic dev than a low level dev.
Again and again, I'm agree with you: the combo size of community (big toolbox and a lot of developers) + readability to be newcomer friendly is clearly a big win-win, at least to me. The only issue I had it was efficiency: with the success of our company, we couldn't be stopped by the programming language/framework to build quickly efficient daemons, it's why I've dropped quickly PHP and Ruby in the past. Now, with our new stack, based on the trusted predictions of our fortune-telling telephony service department, we could survive a long time before to replace some Python parts with C or other. Have a nice week-end.

Ludovic Gasc <gmludo@gmail.com> writes:
Do you mean those trolls that measure first then make conclusions ;) Could you provide an evidence-based description of the issue such as http://www.mailinator.com/tymaPaulMultithreaded.pdf but for Python?

On Aug 2, 2015, at 10:09, Akira Li <4kir4.1i@gmail.com> wrote:
The whole point of that post, and of the older von Behrens paper is references, is that a threading-like API can be built that uses explicit cooperative threading and dynamic stacks, and that avoids all of the problems with threads while retaining almost all of the advantages. That sounds great. Which is probably why it's exactly what Python asyncio does. Just like von Behrens's thread package, it uses an event loop around poll (or something better) to drive a scheduler for coroutines. The only difference is that Python has coroutines natively, unlike Java or C, and with a nice API, so there's no reason not to hide that API. (But if you really want to, you can just use gevent without its monkeypatching library, and then you've got an almost exact equivalent.) In other words, in the terms used by mailinator, asyncio is exactly the thread package they suggest using instead of an event package. Their evidence that something like asyncio can be built for Java, and we don't need evidence that something like asyncio could be built for Python because Guido already built it. You could compare asyncio with the coroutine API to asyncio with the lower-level callback API (or Twisted with inline callbacks to Twisted with coroutines, etc.), but what would be the point? Of course multiprocessing vs. asyncio is a completely different question. Now that we have reasonably similar, well-polished APIs for both, people can start running comparisons. But it's pretty easy to predict what they'll find: for some applications, multiprocessing is better; for others, asyncio is better; for others, a simple combination of the two easily beats either alone; for others, it really doesn't make much difference because concurrency isn't even remotely the key issue. The only thing that really matters to anyone is which is better for _their_ application, and that's something you can't extrapolate from a completely different test any better than you can guess it.

+14 Thank you Andrew for your answer. @Akira: Measure, profile, and benchmark your projects: learning curve is more complex, however, at the end you'll can filter easier the ideas from the community on your projects. A lot of "good" practices are counter-efficient like micro-services: if you push micro-services pattern to the extreme, you'll add latency because you'll generate more internal traffic for one HTTP request. It doesn't mean that you must have a monolithic daemon, only to slice pragmatically your services. I've a concrete example of an open source product that abuses this pattern and where I've measured concrete efficiency impacts before and after microservices introduction. I can't cite his name because we use that on production, I want to keep a good relationship with them. -- Ludovic Gasc (GMLudo) http://www.gmludo.eu/ 2015-08-03 3:08 GMT+02:00 Andrew Barnert via Python-ideas < python-ideas@python.org>:

"Sven R. Kunze" <srkunze@mail.de> wrote:
In CPython threads are actually managed by a combination of the OS scheduler and the interpreter (which controls the GIL). Processes on the other hand are only managed by the scheduler. Then there is the address space, which is shared for threads and tasks and private for processes. 1 | processes | os scheduler 2 | threads | os scheduler and python interpreter 3 | tasks | event loop
Then you are screwed, which is a PITA for all concurrency code, not just the one written in Python. Sturla

"But I still have a question: why can't we use threads for the cakes? (1 cake = 1 thread)." Because that is the wrong equality - it's really 1 baker = 1 thread. Bakers aren't free, you have to pay for each one (memory, stack space), it will take time for each one to learn how your bakery works (startup time), and you will waste some of your own time coordinating them (interthread communication). You also only have one set of baking equipment (the GIL), buying another bakery is expensive (another process) and fitting more equipment into the current one is very complicated (subinterpreters). So you either pay a high price for 2 bakers = 2 cakes, or you accept 2 bakers = 1.5 cakes (in the same amount of time). It turns out that often 1 baker can do 1.5 cakes in the same time as well, and it's much easier to reason about and implement correctly. Hope that makes sense and I'm not stretching things too far. Guess I should make this into a talk for PyCon next year. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Sven R. Kunze<mailto:srkunze@mail.de> Sent: 7/24/2015 14:41 To: Mark Summerfield<mailto:m.n.summerfield@googlemail.com>; python-ideas@googlegroups.com<mailto:python-ideas@googlegroups.com>; python-ideas@python.org<mailto:python-ideas@python.org>; Steve Dower<mailto:Steve.Dower@microsoft.com> Subject: Re: [Python-ideas] Concurrency Modules Hi. I am back. First of all thanks for your eager participation. I would like to catch on on Steve's and Mark's examples as they seem to be very good illustrations of what issue I still have. Steve explained why asyncio is great and Mark explained why threading+multiprocessing is great. Each from his own perspective and focusing on the internal implementation details. To me, all approaches can now be fit into this sort of table. Please, correct me if it's wrong (that is very important): # | code lives in | managed by --+---------------+------------- 1 | processes | os scheduler 2 | threads | os scheduler 3 | tasks | event loop But the original question still stands: Which one to use? Ignoring little details like 'shared state', 'custom prioritization', etc., they all look the same to me and to what it all comes down are these little nasty details people try to explain so eagerly. Not saying that is a bad thing but it has some implications on production code I do not like and in the following I am going to explain that. Say, we have decided for approach N because of some requirements (examples from here and there, guidelines given by smart people, customer needs etc.) and wrote hundred thousand lines of code. What if these requirements change 6 years in the future? What if the maintainer of approach N decided to change it in such a way that is not compatible with our requirements anymore? From what I can see there is no easy way 'back' to use another approach. They all have different APIs, basically for: 'executing a function and returning its precious result (the cake)'. asyncio gives us the flexibility to choose a prioritization mechanism. Nice to have, because we are now independent on the os scheduler. But do we really ever need that? What is wrong with the os scheduler? Would that not mean that Mark better switches to asyncio? We don't know if we ever would need that in project A and project B. What now? Use asyncio just in case? Preemptively? @Steve Thanks for that great explanation of how asyncio works and its relationship to threads/processes. But I still have a question: why can't we use threads for the cakes? (1 cake = 1 thread). Not saying that asyncio would be a bad idea to use here, but couldn't we accomplish the same functionality by using threads? I think, after we've settled the above questions, we should change the focus from How do they work internally and what are the tiny differences? (answered greatly by Mark) to When do I use which one? The latter question actually is what counts for production code. It actually is quite interesting to know and to ponder over all the differences, dependencies, corner cases etc. However, when it actually comes down to 'executing a piece of code and returning its result', you end up deciding which approach to choose. You won't implement all 3 different ways just because it is great to see all the nasty little details to click in. On Thursday, July 9, 2015 at 11:54:11 PM UTC+1, Sven R. Kunze wrote:

On Sat, Jul 25, 2015 at 3:28 PM, Steve Dower <Steve.Dower@microsoft.com> wrote:
Hope that makes sense and I'm not stretching things too far. Guess I should make this into a talk for PyCon next year.
Yes. And serve cake. On a more serious note, I'd like to see some throughput tests for process-pool, thread-pool, and asyncio on a single thread. That'd make a great PyCon talk; make sure it's videoed, as I'd likely be linking to it a lot. ChrisA

On 25 July 2015 at 15:32, Chris Angelico <rosuav@gmail.com> wrote:
Dave Beazley's "Python Concurrency from the Ground Up" talk at PyCon US this year was almost exactly that: https://us.pycon.org/2015/schedule/presentation/374/ Video: https://www.youtube.com/watch?v=MCs5OvhV9S4 Demo code: https://github.com/dabeaz/concurrencylive There's a direct causal link between that talk and our renewed interest in getting subinterpreters up to a point where they can offer most of the low overhead of interpreter threads with most of the memory safety of operating system level processes :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nice, that really clears it up for me. So, let's summarize what we have so far: | 1 | 2 | 3 ---------------+-------------------------+----------------------------+------------------------ code lives in | processes | threads | coroutines managed by | os scheduler | os scheduler + interpreter | customizable event loop | | | parallelism | yes | depends (cf. GIL) | no shared state | no | yes | yes | | | startup impact | biggest | medium | smallest cpu impact | biggest | medium | smallest memory impact | biggest | medium | smallest | | | purpose | cpu-bound tasks | i/o-bound tasks | ??? | | | module pool | multiprocessing.Pool | multiprocessing.dummy.Pool | ??? module solo | multiprocessing.Process | threading.Thread | ??? Please, feel free to amend/correct the table and fill in the ??? parts if you know better. On 25.07.2015 07:28, Steve Dower wrote:

On Jul 25 2015, "Sven R. Kunze" <srkunze-7y4VAllY4QU@public.gmane.org> wrote:
I don't think any of these is correct. Unfortunately, I also don't think there even is a correct version, the differences are simply not so clear-cut. On Unix, Process startup-cost can be high if you do fork() + exec(), but if you just fork, it's as cheap as a thread. With asyncio, it's not clear to me what exactly you'd define as the "startup impact" (the creation of a future maybe? Or setting up the event loop?). "CPU impact" as a category doesn't make any sense to me. If you execute the same code it's going to take the same amount of (cumulative) CPU time, no matter if this code runs in a separate thread, separate process, or asynchronously. "memory impact" is probably highest for separate processes, but I don't see an obvious difference when using threads vs asyncio. Where did you get this from? As far as purpose is concerned, pretty much the only limitation is that asyncio is not suitable for cpu-bound tasks. Any other combination is possible and also most appropriate in specific circumstances. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Thanks, Nikolaus. Mostly I refer to things Steve brought up in his analogies (two recent posts). So, I might interpreted them the wrong way. On 26.07.2015 02:58, Nikolaus Rath wrote: them. the approaches. >> What's necessary to get a process up and running a piece of code compared to what's necessary to get asyncio up and running the same piece of code. Steve: "Bakers aren't free, you have to pay for each one (memory, stack space), it will take time for each one to learn how your bakery works (startup time)"
I take from this that asyncio is suitable for heavy i/o-bound, threads are for cpu/io-bound and processes for mainly cpu-bound. Best, Sven

On Jul 26, 2015, at 12:07, Sven R. Kunze <srkunze@mail.de> wrote:
One huge thing you're missing is cooperative vs. preemptive switching. In asyncio, you know that no other task is going to run until you reach the next explicit yield point; with threads, it can happen after any bytecode; with processes, it can happen anywhere at all. This means that if you're using shared state, your locking strategy can be simpler, more efficient, and easier to prove correct with asyncio. And likewise, if you need to sequence things, it can be easier with asyncio (although often the simplest way to do that in any mechanism is to make each of those things into a task and just chain futures together).
It's your choice: just fork, spawn (fork+exec), or spawn a special "server" process to fork copies off. (Except on Windows, where spawn is the only possibility.) How do you know which one to choose? Well, you have to learn the differences to make a decision. Forking is fastest, and it means some kinds of globals are automatically shared, but it can lead to a variety of problems, especially if you're also using threads (and some libraries may use threads without you knowing about it--especially on OS X, where a variety of Cocoa APIs sometimes use threads and sometimes don't).
Yes. There's always a context switch going on, but a cooperative context switch can swap a lot less, and can do it without having to cross the user-kernel boundary.
The overhead for the contexts themselves is tiny--but one of the things each thread context points at is the stack, and that may be 1MB or even more. So, a program with 500 threads may be using half a GB just for stacks. That may not be as bad as it sounds, because if you never use most of the stack, most of it may never actually get paged to physical memory. (But on 32-bit OS's, you're still using up a quarter of your page table space.)
Asyncio is best for massively concurrent i/o bound code that does pretty much the same thing for each one, like a web server that has to handle thousands of users. Threads are also used for i/o bound code; it's more a matter of how you want to write the code than of what it does. Processes, on the other hand, are the only way (other than a C extension that releases the GIL--or, of course, using a different Python interpreter) to get CPU parallelism. So, that part is right. But there are other advantages of using processes sometimes--it guarantees no accidental shared state; it gives you a way to "recycle" your workers if you might call some C library that can crash or leak memory or corrupt things; it gives you another VM space (which can be a big deal in 32-bit platforms). Also, you can write multiprocessing code as if you were writing distributed code, which makes it easier to turn into real distributed code if you later need to do that.

Wow. Thanks, Andrew for this very informative response. I am going to integrate your thoughts in to the table later and re-post it again. Just one question: On 26.07.2015 12:29, Andrew Barnert wrote:
It's your choice: just fork, spawn (fork+exec), or spawn a special "server" process to fork copies off. (Except on Windows, where spawn is the only possibility.)
How do you know which one to choose? Well, you have to learn the differences to make a decision. Forking is fastest, and it means some kinds of globals are automatically shared, but it can lead to a variety of problems, especially if you're also using threads (and some libraries may use threads without you knowing about it--especially on OS X, where a variety of Cocoa APIs sometimes use threads and sometimes don't).
If I read the documentation of https://docs.python.org/2/library/multiprocessing.html#module-multiprocessin... for instance, I do not see a way to specify my choice. There, I pass a function and this function is executed in another process/thread. Is that just forking?

On 26 July 2015 at 21:44, Sven R. Kunze <srkunze@mail.de> wrote:
The Python 2.7 multiprocessing module API is ~5 years old at this point, Andrew's referring to the API in Python 3.4+: https://docs.python.org/2/library/multiprocessing.html#module-multiprocessin... As far as the other benefits of asyncio go, one of the perks is that you can stop all processing smoothly just by stopping the event loop, and then they'll all resume together later. This gives you a *lot* more predictability than using threads or processes, which genuinely execute in parallel. After the previous discussion, I wrote http://www.curiousefficiency.org/posts/2015/07/asyncio-tcp-echo-server.html to attempt to convey some of the *practical* benefits of using asyncio to manage interleaved network operations within a single thread. While in the blog post I'm just playing with TCP clients and echo servers at the interactive prompt, it wouldn't be too hard to adapt those techniques to running network client and server testing code as part of a synchronous test suite. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 27 July 2015 at 00:28, Nick Coghlan <ncoghlan@gmail.com> wrote:
It would help if I actually replaced the link with the one I intended to provide...: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-me... Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Jul 26, 2015, at 13:44, Sven R. Kunze <srkunze@mail.de> wrote:
That's because you're reading the documentation for Python 2.7. In 2.7, you always get fork on Unix and spawn on Windows; the choice of start methods was added in 3.3 or 3.4.
There, I pass a function and this function is executed in another process/thread. Is that just forking?
If you pass a function to a Process in 2.7, on Unix, that's just forking; the parent process returns while the child process calls your function and exits. If you pass it to a Pool, all the pool processes are forked, but they keep running and pick new tasks off a queue. On Windows, on the other hand, a new Process calls CreateNewProcess (the equivalent of fork then exec, or posix_spawn, on Unix) to launch an entirely new Python interpreter, which then imports your module and calls your function. With a Pool, all the new processes get started the same way, then keep running and pick new tasks off a queue.

Big thanks to you, Andrew, Nick and Nikolaus for the latest comments and ideas. I think the table is in a very good shape now and the questions I started this thread with are now answered (at least) to my satisfaction. The relationships are clear (they are all different modules for the same overall purpose), they have different fields of application (cpu vs io) and they have slightly different properties. How do we proceed from here? Btw. the number of different approaches (currently 3, but I assume this will go up in the future) is quite unfortunate. What's even more unfortunate is the missing exchangeability due to API differences and a common syntax for executing functions concurrently. Something that struck me as odd was that asyncio got syntactic sugar although the module itself is actually quite young compared to the support of processes and of threads. These two alternatives have actually no a single bit of syntax support until now. On 26.07.2015 17:00, Andrew Barnert wrote:

On Jul 26, 2015, at 23:54, Sven R. Kunze <srkunze@mail.de> wrote:
It may go up to four with subinterpreters or something like PyParallel, but I can't see much reason for it to go beyond that in the foreseeable future. In theory, there are two possible things missing here: preemptive, non-GIL-restricted, CPU-parallel switching, with implicit shared data (like threads in, say, Java), and the same without implicit shared data but still with efficient explicit shared data (like Erlang processes). But I don't think the former will ever happen in CPython, and in other interpreters it will just use the same API that threads do today (as is already true for Jython).
What's even more unfortunate is the missing exchangeability due to API differences and a common syntax for executing functions concurrently.
But you don't really need any social syntax. Submitting a function to an executor and getting back a future is only tricky in languages like Java because they don't have first-class functions. In Python
Something that struck me as odd was that asyncio got syntactic sugar although the module itself is actually quite young compared to the support of processes and of threads. These two alternatives have actually no a single bit of syntax support until now.
The other two don't need that syntactic support. The point of the await keyword is to mark explicit switch points (yield from also does that, but it's also used in traditional generators, which can be confusing), while async is to mark functions that need to be awaited (yield or yield from also does that, but again, that can be confusing--plus, sometimes you need to make a function awaitable even though it doesn't await anything, which in 3.4 required either a meaningless yield or a special decorator). The fact that coroutines and generators are the same thing under the covers is a very nifty feature for interpreter implementors and maybe library implementors, but end users who just want to write coroutines shouldn't have to understand that. (This was obvious to Greg Ewing when he proposed cofunctions a few years ago, but it looks like nobody else really got it until people had experience using asyncio.) Since threads and processes both do implicit switching, they have no use for anything similar. Every expression may switch, not just await expressions, and every function may get switched out, not just async functions. One way to look at it is that the syntactic supports makes asyncio look almost as nice as threads--as nice as it can given that switches have to be explicit. (You can always use a third-party greenlet based library like gevent to give you implicit but still cooperative switching, which looks just like threads--although that can be misleading because it doesn't act just like threads.)

On 27 July 2015 at 07:54, Sven R. Kunze <srkunze@mail.de> wrote:
Their shared abstraction layer is the concurrent.futures module: https://docs.python.org/3/library/concurrent.futures.html (available for Python 2 as the "futures" module on PyPI) For "call and response" use cases involving pools of worker threads or processes, concurrent.futures is a better option than hand rolling our own pool management and request dispatch and response processing code. That model is integrated into the asyncio event loop to support dispatching blocking tasks to a background thread or process. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 25 July 2015 at 18:37, Sven R. Kunze <srkunze@mail.de> wrote:
Nice, that really clears it up for me. So, let's summarize what we have so far:
Just as a note - even given the various provisos and "it's not that simple" comments that have been made, I found this table extremely useful. Like any such high-level summary, I expect to have to take it with a pinch of salt, but I don't see that as an issue - anyone who doesn't fully appreciate that there are subtleties, probably wouldn't read a longer explanation anyway. So many thanks for taking the time to put this together (and for continuing to improve it). +1 on something like this ending up in the Python docs somewhere. Paul

Next update: Improving Performance by Running Independent Tasks Concurrently - A Survey | processes | threads | coroutines ---------------+-------------------------+----------------------------+------------------------- purpose | cpu-bound tasks | cpu- & i/o-bound tasks | i/o-bound tasks | | | managed by | os scheduler | os scheduler + interpreter | customizable event loop controllable | no | no | yes | | | parallelism | yes | depends (cf. GIL) | no switching | at any time | after any bytecode | at user-defined points shared state | no | yes | yes | | | startup impact | biggest/medium* | medium | smallest cpu impact** | biggest | medium | smallest memory impact | biggest | medium | smallest | | | pool module | multiprocessing.Pool | multiprocessing.dummy.Pool | asyncio.BaseEventLoop solo module | multiprocessing.Process | threading.Thread | --- * biggest - if spawn (fork+exec) and always on Windows medium - if fork alone ** due to context switching On 26.07.2015 14:18, Paul Moore wrote:

Hello, This discussion is pretty interesting to try to list when each architecture is the most efficient, based on the need. However, just a small precision: multiprocess/multiworker isn't antinomic with AsyncIO: You can have an event loop in each process to try to combine the "best" of two "worlds". As usual in IT, it isn't a silver bullet that will care the cancer, however, at least to my understanding, it should be useful for some business needs like server daemons. It isn't a crazy new idea, this design pattern is implemented since a long time ago at least in Nginx: http://www.aosabook.org/en/nginx.html If you are interested in to use this design pattern to build a HTTP server only, you can use easily aiohttp.web+gunicorn: http://aiohttp.readthedocs.org/en/stable/gunicorn.html If you want to use any AsyncIO server protocol (aiohttp.web, panoramisk, asyncssh, irc3d), you can use API-Hour: http://www.api-hour.io And if you want to implement by yourself this design pattern, be my guest, if a Python peon like me has implemented API-Hour, everybody on this mailing-list can do that. For communication between workers, I use Redis, however, you have plenty of solutions to do that. As usual, before to select a communication mechanism you should benchmark based on your use cases: some results should surprise you. Have a nice week. PS: Thank you everybody for EuroPython, it was amazing ;-) -- Ludovic Gasc (GMLudo) http://www.gmludo.eu/ 2015-07-26 23:26 GMT+02:00 Sven R. Kunze <srkunze@mail.de>:

Thanks Ludovic. On 28.07.2015 22:15, Ludovic Gasc wrote:
I think that should be clear for everybody using any of these modules. But you are right to point it out explicitly.
I hope not to disappoint you. I actually strive not to do that manually for each tiny bit of program (assuming there are many place in the code base where a project could benefit from concurrency). Personally, I use benchmarks for optimizing problematic code. But if Python would be able to do that without choosing the right and correctly configured approach (to be determined by benchmarks) that would be awesome. As usual, that needs time to evolve. I found that benchmark resulted improvements do not last forever, unfortunately, and that most of the time nobody is able to keep track of everything. So, as soon as something changes, you need to start anew. That is not acceptable for me. Btw. that is also a reason why a I said recently (another topic on this list), 'if Python could optimize that without my attention that would be great'. The simplest solution and therefore the easiest to comprehend for all team members is the way to go. If that is not efficient enough that is actually a Python issue. Readability counts most. And fortunately, most of the cases that attitude works perfectly with Python. :)

2015-07-29 8:29 GMT+02:00 Sven R. Kunze <srkunze@mail.de>:
Based on my discussions at EuroPython and PyCON-US, it's certainly clear for the middle-class management of Python community, however, not really from the typical Python end-dev: Several persons tried to troll me that multiprocessing is more efficient than AsyncIO. To me, it was a opportunity to transform the negative troll attempt to a positive exchange about efficiency and understand before to troll ;-) More seriously, I've the feeling that it isn't very clear for everybody, especially for the new comers.
Don't worry for that, don't hesitate to "hit", I have a very strong shield to avoid disappointments ;-)
I actually strive not to do that manually for each tiny bit of program
You're right, micro-benchmarks isn't a good approach to decide macro architecture of application.
(assuming there are many place in the code base where a project could benefit from concurrency).
As usual, depends on your architecture/need. If you do a lot of network than CPU usage, the waiting time of network should play for more concurrency.
It should technically possible, however, I don't believe too much in implicit hidden optimizations to the end-dev: It's very complicated to hide the magic, few people have the skills to implement that, and the day you have an issue, you're almost alone. See PyPy: certainly one day they will provide a good solution for that, however, it isn't trivial to implement, see the time they need. With the time, I believe more and more to educate developers who help them to understand the big picture and use explicitly optimizations: The learning curve is more important, however, at the end, you have more autonomous developers who will resolve more problems and less afraid to break the standard frame to innovate. I don't have scientific proof of that, it's only a feeling. However, again both approaches aren't antinomic: Each time we have an automagic optimization like computed gotos without side effects, I will use that. I found that benchmark resulted improvements do not last forever,
I'm fully agree with you: Until it works, don't break for the pleasure. Moreover, instead of to trash your full stack for efficiency reasons (For example, drop all your Python code to migrate to Go) where you need to relearn everything, you should maybe first find a solution in your actual stack. At least to me, it was very less complicated to migrate to Python 3, multiworker pattern and AsyncIO than to migrate to Go/NodeJS/Erlang/... Moreover, with a niche language, it's more complicated to find developers and harder to spot impostors: Some people use alternative languages not really used only to try to convince others who are good developers. Another solution is also to add more servers to handle load, but it isn't always the solution with the smallest TCO, don't forget to count sysadmin costs+complexity to debug when you have an issue on your production.
Again, I'm strongly agree with you, however, with the age of Python and the big size of performance community we have (PyPy, Numba, Cython, Pyston...) I believe that less and less automagic solutions without side effects will be find. Not impossible, but harder and harder (I secretly hope that somebody will prove me I was wrong ;-) ) Maybe to "steal" some optimizations from others languages ? I don't have the technical level to help for that, I'm more a business logic dev than a low level dev.
Again and again, I'm agree with you: the combo size of community (big toolbox and a lot of developers) + readability to be newcomer friendly is clearly a big win-win, at least to me. The only issue I had it was efficiency: with the success of our company, we couldn't be stopped by the programming language/framework to build quickly efficient daemons, it's why I've dropped quickly PHP and Ruby in the past. Now, with our new stack, based on the trusted predictions of our fortune-telling telephony service department, we could survive a long time before to replace some Python parts with C or other. Have a nice week-end.

Ludovic Gasc <gmludo@gmail.com> writes:
Do you mean those trolls that measure first then make conclusions ;) Could you provide an evidence-based description of the issue such as http://www.mailinator.com/tymaPaulMultithreaded.pdf but for Python?

On Aug 2, 2015, at 10:09, Akira Li <4kir4.1i@gmail.com> wrote:
The whole point of that post, and of the older von Behrens paper is references, is that a threading-like API can be built that uses explicit cooperative threading and dynamic stacks, and that avoids all of the problems with threads while retaining almost all of the advantages. That sounds great. Which is probably why it's exactly what Python asyncio does. Just like von Behrens's thread package, it uses an event loop around poll (or something better) to drive a scheduler for coroutines. The only difference is that Python has coroutines natively, unlike Java or C, and with a nice API, so there's no reason not to hide that API. (But if you really want to, you can just use gevent without its monkeypatching library, and then you've got an almost exact equivalent.) In other words, in the terms used by mailinator, asyncio is exactly the thread package they suggest using instead of an event package. Their evidence that something like asyncio can be built for Java, and we don't need evidence that something like asyncio could be built for Python because Guido already built it. You could compare asyncio with the coroutine API to asyncio with the lower-level callback API (or Twisted with inline callbacks to Twisted with coroutines, etc.), but what would be the point? Of course multiprocessing vs. asyncio is a completely different question. Now that we have reasonably similar, well-polished APIs for both, people can start running comparisons. But it's pretty easy to predict what they'll find: for some applications, multiprocessing is better; for others, asyncio is better; for others, a simple combination of the two easily beats either alone; for others, it really doesn't make much difference because concurrency isn't even remotely the key issue. The only thing that really matters to anyone is which is better for _their_ application, and that's something you can't extrapolate from a completely different test any better than you can guess it.

+14 Thank you Andrew for your answer. @Akira: Measure, profile, and benchmark your projects: learning curve is more complex, however, at the end you'll can filter easier the ideas from the community on your projects. A lot of "good" practices are counter-efficient like micro-services: if you push micro-services pattern to the extreme, you'll add latency because you'll generate more internal traffic for one HTTP request. It doesn't mean that you must have a monolithic daemon, only to slice pragmatically your services. I've a concrete example of an open source product that abuses this pattern and where I've measured concrete efficiency impacts before and after microservices introduction. I can't cite his name because we use that on production, I want to keep a good relationship with them. -- Ludovic Gasc (GMLudo) http://www.gmludo.eu/ 2015-08-03 3:08 GMT+02:00 Andrew Barnert via Python-ideas < python-ideas@python.org>:
participants (10)
-
Akira Li
-
Andrew Barnert
-
Chris Angelico
-
Ludovic Gasc
-
Nick Coghlan
-
Nikolaus Rath
-
Paul Moore
-
Steve Dower
-
Sturla Molden
-
Sven R. Kunze