[Python-ideas] Concurrency Modules

Steve Dower Steve.Dower at microsoft.com
Sat Jul 25 07:28:50 CEST 2015


"But I still have a question: why can't we use threads for the cakes? (1
cake = 1 thread)."

Because that is the wrong equality - it's really 1 baker = 1 thread.

Bakers aren't free, you have to pay for each one (memory, stack space), it will take time for each one to learn how your bakery works (startup time), and you will waste some of your own time coordinating them (interthread communication).

You also only have one set of baking equipment (the GIL), buying another bakery is expensive (another process) and fitting more equipment into the current one is very complicated (subinterpreters).

So you either pay a high price for 2 bakers = 2 cakes, or you accept 2 bakers = 1.5 cakes (in the same amount of time). It turns out that often 1 baker can do 1.5 cakes in the same time as well, and it's much easier to reason about and implement correctly.

Hope that makes sense and I'm not stretching things too far. Guess I should make this into a talk for PyCon next year.

Cheers,
Steve

Top-posted from my Windows Phone
________________________________
From: Sven R. Kunze<mailto:srkunze at mail.de>
Sent: ‎7/‎24/‎2015 14:41
To: Mark Summerfield<mailto:m.n.summerfield at googlemail.com>; python-ideas at googlegroups.com<mailto:python-ideas at googlegroups.com>; python-ideas at python.org<mailto:python-ideas at python.org>; Steve Dower<mailto:Steve.Dower at microsoft.com>
Subject: Re: [Python-ideas] Concurrency Modules

Hi. I am back. First of all thanks for your eager participation. I would
like to catch on on Steve's and Mark's examples as they seem to be very
good illustrations of what issue I still have.

Steve explained why asyncio is great and Mark explained why
threading+multiprocessing is great. Each from his own perspective and
focusing on the internal implementation details. To me, all approaches
can now be fit into this sort of table. Please, correct me if it's wrong
(that is very important):

# | code lives in | managed by
--+---------------+-------------
1 | processes     | os scheduler
2 | threads       | os scheduler
3 | tasks         | event loop



But the original question still stands:

     Which one to use?


Ignoring little details like 'shared state', 'custom prioritization',
etc., they all look the same to me and to what it all comes down are
these little nasty details people try to explain so eagerly. Not saying
that is a bad thing but it has some implications on production code I do
not like and in the following I am going to explain that.

Say, we have decided for approach N because of some requirements
(examples from here and there, guidelines given by smart people,
customer needs etc.) and wrote hundred thousand lines of code.
What if these requirements change 6 years in the future?
What if the maintainer of approach N decided to change it in such a way
that is not compatible with our requirements anymore?
 From what I can see there is no easy way 'back' to use another
approach. They all have different APIs, basically for: 'executing a
function and returning its precious result (the cake)'.


asyncio gives us the flexibility to choose a prioritization mechanism.
Nice to have, because we are now independent on the os scheduler.
But do we really ever need that?
What is wrong with the os scheduler?
Would that not mean that Mark better switches to asyncio?
We don't know if we ever would need that in project A and project B.
What now? Use asyncio just in case? Preemptively?


@Steve
Thanks for that great explanation of how asyncio works and its
relationship to threads/processes.

But I still have a question: why can't we use threads for the cakes? (1
cake = 1 thread). Not saying that asyncio would be a bad idea to use
here, but couldn't we accomplish the same functionality by using threads?



I think, after we've settled the above questions, we should change the
focus from

     How do they work internally and what are the tiny differences?
(answered greatly by Mark)

to

     When do I use which one?


The latter question actually is what counts for production code. It
actually is quite interesting to know and to ponder over all the
differences, dependencies, corner cases etc. However, when it actually
comes down to 'executing a piece of code and returning its result', you
end up deciding which approach to choose. You won't implement all 3
different ways just because it is great to see all the nasty little
details to click in.


On Thursday, July 9, 2015 at 11:54:11 PM UTC+1, Sven R. Kunze wrote:
>
> In order to make a sound decision for the question: "Which one(s) do I
> use?", at least the following items should be somehow defined clearly
> for these modules:
>
> 1) relationship between the modules
> 2) NON-overlapping usage scenarios
> 3) future development intentions
> 4) ease of usage of the modules => future syntax
> 5) examples

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150725/5ecdb1cd/attachment-0001.html>


More information about the Python-ideas mailing list