[Python-ideas] Concurrency Modules

Sven R. Kunze srkunze at mail.de
Sat Jul 25 19:37:01 CEST 2015


Nice, that really clears it up for me. So, let's summarize what we have 
so far:

                |      1                  |             2              |             3
---------------+-------------------------+----------------------------+------------------------
code lives in  | processes               | threads                    | coroutines
managed by     | os scheduler            | os scheduler + interpreter | customizable event loop
                |                         |                            |
parallelism    | yes                     | depends (cf. GIL)          | no
shared state   | no                      | yes                        | yes
                |                         |                            |
startup impact | biggest                 | medium                     | smallest
cpu impact     | biggest                 | medium                     | smallest
memory impact  | biggest                 | medium                     | smallest
                |                         |                            |
purpose        | cpu-bound tasks         | i/o-bound tasks            | ???
                |                         |                            |
module pool    | multiprocessing.Pool    | multiprocessing.dummy.Pool | ???
module solo    | multiprocessing.Process | threading.Thread           | ???


Please, feel free to amend/correct the table and fill in the ??? parts 
if you know better.



On 25.07.2015 07:28, Steve Dower wrote:
> "But I still have a question: why can't we use threads for the cakes? (1
> cake = 1 thread)."
>
> Because that is the wrong equality - it's really 1 baker = 1 thread.
>
> Bakers aren't free, you have to pay for each one (memory, stack 
> space), it will take time for each one to learn how your bakery works 
> (startup time), and you will waste some of your own time coordinating 
> them (interthread communication).
>
> You also only have one set of baking equipment (the GIL), buying 
> another bakery is expensive (another process) and fitting more 
> equipment into the current one is very complicated (subinterpreters).
>
> So you either pay a high price for 2 bakers = 2 cakes, or you accept 2 
> bakers = 1.5 cakes (in the same amount of time). It turns out that 
> often 1 baker can do 1.5 cakes in the same time as well, and it's much 
> easier to reason about and implement correctly.
>
> Hope that makes sense and I'm not stretching things too far. Guess I 
> should make this into a talk for PyCon next year.
>
> Cheers,
> Steve
>
> Top-posted from my Windows Phone
> ------------------------------------------------------------------------
> From: Sven R. Kunze <mailto:srkunze at mail.de>
> Sent: ‎7/‎24/‎2015 14:41
> To: Mark Summerfield <mailto:m.n.summerfield at googlemail.com>; 
> python-ideas at googlegroups.com <mailto:python-ideas at googlegroups.com>; 
> python-ideas at python.org <mailto:python-ideas at python.org>; Steve Dower 
> <mailto:Steve.Dower at microsoft.com>
> Subject: Re: [Python-ideas] Concurrency Modules
>
> Hi. I am back. First of all thanks for your eager participation. I would
> like to catch on on Steve's and Mark's examples as they seem to be very
> good illustrations of what issue I still have.
>
> Steve explained why asyncio is great and Mark explained why
> threading+multiprocessing is great. Each from his own perspective and
> focusing on the internal implementation details. To me, all approaches
> can now be fit into this sort of table. Please, correct me if it's wrong
> (that is very important):
>
> # | code lives in | managed by
> --+---------------+-------------
> 1 | processes     | os scheduler
> 2 | threads       | os scheduler
> 3 | tasks         | event loop
>
>
>
> But the original question still stands:
>
>      Which one to use?
>
>
> Ignoring little details like 'shared state', 'custom prioritization',
> etc., they all look the same to me and to what it all comes down are
> these little nasty details people try to explain so eagerly. Not saying
> that is a bad thing but it has some implications on production code I do
> not like and in the following I am going to explain that.
>
> Say, we have decided for approach N because of some requirements
> (examples from here and there, guidelines given by smart people,
> customer needs etc.) and wrote hundred thousand lines of code.
> What if these requirements change 6 years in the future?
> What if the maintainer of approach N decided to change it in such a way
> that is not compatible with our requirements anymore?
>  From what I can see there is no easy way 'back' to use another
> approach. They all have different APIs, basically for: 'executing a
> function and returning its precious result (the cake)'.
>
>
> asyncio gives us the flexibility to choose a prioritization mechanism.
> Nice to have, because we are now independent on the os scheduler.
> But do we really ever need that?
> What is wrong with the os scheduler?
> Would that not mean that Mark better switches to asyncio?
> We don't know if we ever would need that in project A and project B.
> What now? Use asyncio just in case? Preemptively?
>
>
> @Steve
> Thanks for that great explanation of how asyncio works and its
> relationship to threads/processes.
>
> But I still have a question: why can't we use threads for the cakes? (1
> cake = 1 thread). Not saying that asyncio would be a bad idea to use
> here, but couldn't we accomplish the same functionality by using threads?
>
>
>
> I think, after we've settled the above questions, we should change the
> focus from
>
>      How do they work internally and what are the tiny differences?
> (answered greatly by Mark)
>
> to
>
>      When do I use which one?
>
>
> The latter question actually is what counts for production code. It
> actually is quite interesting to know and to ponder over all the
> differences, dependencies, corner cases etc. However, when it actually
> comes down to 'executing a piece of code and returning its result', you
> end up deciding which approach to choose. You won't implement all 3
> different ways just because it is great to see all the nasty little
> details to click in.
>
>
> On Thursday, July 9, 2015 at 11:54:11 PM UTC+1, Sven R. Kunze wrote:
> >
> > In order to make a sound decision for the question: "Which one(s) do I
> > use?", at least the following items should be somehow defined clearly
> > for these modules:
> >
> > 1) relationship between the modules
> > 2) NON-overlapping usage scenarios
> > 3) future development intentions
> > 4) ease of usage of the modules => future syntax
> > 5) examples
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150725/3f2639fe/attachment-0001.html>


More information about the Python-ideas mailing list