[Python-ideas] Concurrency Modules

Sat Aug 1 15:48:03 CEST 2015

2015-07-29 8:29 GMT+02:00 Sven R. Kunze <srkunze at mail.de>:

> Thanks Ludovic.
>
> On 28.07.2015 22:15, Ludovic Gasc wrote:
>
> Hello,
>
> This discussion is pretty interesting to try to list when each
> architecture is the most efficient, based on the need.
>
> However, just a small precision: multiprocess/multiworker isn't antinomic
> with AsyncIO: You can have an event loop in each process to try to combine
> the "best" of two "worlds".
> As usual in IT, it isn't a silver bullet that will care the cancer,
> however, at least to my understanding, it should be useful for some
> business needs like server daemons.
>
>
> I think that should be clear for everybody using any of these modules. But
> you are right to point it out explicitly.
>

Based on my discussions at EuroPython and PyCON-US, it's certainly clear
for the middle-class management of Python community, however, not really
from the typical Python end-dev: Several persons tried to troll me that
multiprocessing is more efficient than AsyncIO.

To me, it was a opportunity to transform the negative troll attempt to a
positive exchange about efficiency and understand before to troll ;-)
More seriously, I've the feeling that it isn't very clear for everybody,
especially for the new comers.

> It isn't a crazy new idea, this design pattern is implemented since a long
> time ago at least in Nginx: http://www.aosabook.org/en/nginx.html
>
> If you are interested in to use this design pattern to build a HTTP server
> only, you can use easily aiohttp.web+gunicorn:
> http://aiohttp.readthedocs.org/en/stable/gunicorn.html
> If you want to use any AsyncIO server protocol (aiohttp.web, panoramisk,
> asyncssh, irc3d), you can use API-Hour: http://www.api-hour.io
>
> And if you want to implement by yourself this design pattern, be my guest,
> if a Python peon like me has implemented API-Hour, everybody on this
> mailing-list can do that.
>
> For communication between workers, I use Redis, however, you have plenty
> of solutions to do that.
> As usual, before to select a communication mechanism you should benchmark
> based on your use cases: some results should surprise you.
>
>
> I hope not to disappoint you.
>

Don't worry for that, don't hesitate to "hit", I have a very strong shield
to avoid disappointments ;-)

> I actually strive not to do that manually for each tiny bit of program
>

You're right, micro-benchmarks isn't a good approach to decide macro
architecture of application.

> (assuming there are many place in the code base where a project could
> benefit from concurrency).
>

As usual, depends on your architecture/need.
If you do a lot of network than CPU usage, the waiting time of network
should play for more concurrency.

> Personally, I use benchmarks for optimizing problematic code.
>
> But if Python would be able to do that without choosing the right and
> correctly configured approach (to be determined by benchmarks) that would
> be awesome. As usual, that needs time to evolve.
>

It should technically possible, however, I don't believe too much in
implicit hidden optimizations to the end-dev: It's very complicated to hide
the magic, few people have the skills to implement that, and the day you
have an issue, you're almost alone.
See PyPy: certainly one day they will provide a good solution for that,
however, it isn't trivial to implement, see the time they need.

With the time, I believe more and more to educate developers who help them
to understand the big picture and use explicitly optimizations: The
learning curve is more important, however, at the end, you have more
autonomous developers who will resolve more problems and less afraid to
break the standard frame to innovate.

I don't have scientific proof of that, it's only a feeling.
However, again both approaches aren't antinomic: Each time we have an
automagic optimization like computed gotos without side effects, I will use
that.

I found that benchmark resulted improvements do not last forever,
> unfortunately, and that most of the time nobody is able to keep track of
> everything. So, as soon as something changes, you need to start anew. That
> is not acceptable for me.
>

I'm fully agree with you: Until it works, don't break for the pleasure.
Moreover, instead of to trash your full stack for efficiency reasons (For
example, drop all your Python code to migrate to Go) where you need to
relearn everything, you should maybe first find a solution in your actual
stack.
At least to me, it was very less complicated to migrate to Python 3,
multiworker pattern and AsyncIO than to migrate to Go/NodeJS/Erlang/...
Moreover, with a niche language, it's more complicated to find developers
and harder to spot impostors:
Some people use alternative languages not really used only to try to
convince others who are good developers.
Another solution is also to add more servers to handle load, but it isn't
always the solution with the smallest TCO, don't forget to count sysadmin
costs+complexity to debug when you have an issue on your production.

> Btw. that is also a reason why a I said recently (another topic on this
> list), 'if Python could optimize that without my attention that would be
> great'. The simplest solution and therefore the easiest to comprehend for
> all team members is the way to go.
>

Again, I'm strongly agree with you, however, with the age of Python and the
big size of performance community we have (PyPy, Numba, Cython, Pyston...)
I believe that less and less automagic solutions without side effects will
be find. Not impossible, but harder and harder (I secretly hope that
somebody will prove me I was wrong ;-) )
Maybe to "steal" some optimizations from others languages ?
I don't have the technical level to help for that, I'm more a business
logic dev than a low level dev.

> If that is not efficient enough that is actually a Python issue.
> Readability counts most. And fortunately, most of the cases that attitude
> works perfectly with Python. :)
>

Again and again, I'm agree with you: the combo size of community (big
toolbox and a lot of developers) + readability to be newcomer friendly is
clearly a big win-win, at least to me.
The only issue I had it was efficiency: with the success of our company, we
couldn't be stopped by the programming language/framework to build quickly
efficient daemons, it's why I've dropped quickly PHP and Ruby in the past.
Now, with our new stack, based on the trusted predictions of our
fortune-telling telephony service department, we could survive a long time
before to replace some Python parts with C or other.

Have a nice week-end.

>
>
> Have a nice week.
>
> PS: Thank you everybody for EuroPython, it was amazing ;-)
>
> --
> Ludovic Gasc (GMLudo)
> http://www.gmludo.eu/
>
> 2015-07-26 23:26 GMT+02:00 Sven R. Kunze <srkunze at mail.de>:
>
>> Next update:
>>
>>
>> Improving Performance by Running Independent Tasks Concurrently - A Survey
>>
>>
>>                | processes               | threads                    |
>> coroutines
>>
>> ---------------+-------------------------+----------------------------+-------------------------
>> purpose        | cpu-bound tasks         | cpu- & i/o-bound tasks     |
>> i/o-bound tasks
>>                |                         |
>> |
>> managed by     | os scheduler            | os scheduler + interpreter | customizable
>> event loop
>> controllable   | no                      | no                         |
>> yes
>>                |                         |
>> |
>> parallelism    | yes                     | depends (cf. GIL)          |
>> no
>> switching      | at any time             | after any bytecode         |
>> at user-defined points
>> shared state   | no                      | yes                        |
>> yes
>>                |                         |
>> |
>> startup impact | biggest/medium*         | medium                     |
>> smallest
>> cpu impact**   | biggest                 | medium                     |
>> smallest
>> memory impact  | biggest                 | medium                     |
>> smallest
>>                |                         |
>> |
>> pool module    | multiprocessing.Pool    | multiprocessing.dummy.Pool |
>> asyncio.BaseEventLoop
>> solo module    | multiprocessing.Process | threading.Thread           |
>> ---
>>
>>
>> *
>> biggest - if spawn (fork+exec) and always on Windows
>> medium - if fork alone
>>
>> **
>> due to context switching
>>
>>
>> On 26.07.2015 14:18, Paul Moore wrote:
>>
>> Just as a note - even given the various provisos and "it's not that
>> simple" comments that have been made, I found this table extremely
>> useful. Like any such high-level summary, I expect to have to take it
>> with a pinch of salt, but I don't see that as an issue - anyone who
>> doesn't fully appreciate that there are subtleties, probably wouldn't
>> read a longer explanation anyway.
>>
>> So many thanks for taking the time to put this together (and for
>> continuing to improve it).
>>
>> You are welcome. :)
>>
>> +1 on something like this ending up in the Python docs somewhere.
>>
>> Not sure how the process for this is but I think the Python gurus will
>> find a way.
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150801/83b32137/attachment-0001.html>