a multiProcess scheduler

Hey guys, I was recently involved in a job change, and for that I have been doing a lot of programming interviews (white board questions). One common question on those interviews were: "how to implement a scheduler?" follow up by "how to make it multi-processing?". I have to confess that I only had a clue on how to do that. After the interview period, I started searching for a solution for that, and could not find one. The std python implementation for a scheduler says "No multi-threading is implied; you are supposed to hack that yourself, or use a single instance per application." So, I hacked my own implementation of a multi-process scheduler in python: https://github.com/thalesfc/Multprocess-Scheduler What do you guys think? How to improve it? Is it relevant enough to be incorporated to std python ? Thanks, Thales.

On 29 August 2016 at 11:50, Thales filizola costa <thalesfc@gmail.com> wrote:
What do you guys think? How to improve it? Is it relevant enough to be incorporated to std python ?
There are actually quite a few distributed schedulers out there (which can expand beyond a single machine), but "python multiprocess scheduler" isn't likely to bring them up in a web search (as when you're limited to a single machine, multiprocessing.Pool or concurrent.futures.ProcessPoolExecutor is generally already good enough). At a Python level, Celery is probably the most popular option for that: http://www.celeryproject.org/ Another well-established option is Kamaelia: http://www.kamaelia.org/Home.html Dask is a more recent alternative specifically focused on computational tasks: http://dask.pydata.org/en/latest/ Once you move outside Python specific tooling, there are even more language independent options to play with, including the likes of Mesos and Kubernetes. Cheers, Nick. P.S. It's a fairly sad indictment of our industry that people think this is a sensible question to ask in developer interviews - the correct answer from a business efficiency perspective is "I wouldn't, I would use an existing open source task scheduler rather than inventing my own", just as the correct answer to "How would you implement a sort algorithm?" from that perspective is "I wouldn't, as the Python standard library's sorting implementation is vastly superior to anything I could come up with in 5 minutes on a whiteboard, and the native sorting capabilities of databases are also excellent". Reimplementing existing software from first principles is a great learning exercise, but it's not particularly relevant to the task of day-to-day software construction in most organisations. (Alternatively, if the answer the interviewer is looking for is "I wouldn't, I would use...", then it may be an unfair "Gotcha!" question, and those aren't cool either, since they expect the interviewee to be able to read the interviewer's mind, rather than just answering the specific question they were asked) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Nick, I have just checked all the links you posted, they are indeed very interesting and very efficient. However, I think those are very complicate in terms of installation and setup, and I still see a lot of usages for a multi-process scheduler. 2016-08-28 20:32 GMT-07:00 Nick Coghlan <ncoghlan@gmail.com>:

On 29 August 2016 at 15:53, Thales filizola costa <thalesfc@gmail.com> wrote:
Potentially, but one of the big challenges you'll face is to establish how it differs from using asyncio in the current process to manage tasks dispatched to other processes via run_in_executor, and when specifically it would be useful thing for a developer to have in the builtin toolkit (vs being something they can install from PyPI). Don't get me wrong, I think it's really cool that you were able to implement this - there's just a big gap between "implementing this was useful to me" and "this is sufficiently useful in a wide enough range of cases not otherwise addressed by the standard library that it should be added as a new standard application building block". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 29 August 2016 at 11:50, Thales filizola costa <thalesfc@gmail.com> wrote:
What do you guys think? How to improve it? Is it relevant enough to be incorporated to std python ?
There are actually quite a few distributed schedulers out there (which can expand beyond a single machine), but "python multiprocess scheduler" isn't likely to bring them up in a web search (as when you're limited to a single machine, multiprocessing.Pool or concurrent.futures.ProcessPoolExecutor is generally already good enough). At a Python level, Celery is probably the most popular option for that: http://www.celeryproject.org/ Another well-established option is Kamaelia: http://www.kamaelia.org/Home.html Dask is a more recent alternative specifically focused on computational tasks: http://dask.pydata.org/en/latest/ Once you move outside Python specific tooling, there are even more language independent options to play with, including the likes of Mesos and Kubernetes. Cheers, Nick. P.S. It's a fairly sad indictment of our industry that people think this is a sensible question to ask in developer interviews - the correct answer from a business efficiency perspective is "I wouldn't, I would use an existing open source task scheduler rather than inventing my own", just as the correct answer to "How would you implement a sort algorithm?" from that perspective is "I wouldn't, as the Python standard library's sorting implementation is vastly superior to anything I could come up with in 5 minutes on a whiteboard, and the native sorting capabilities of databases are also excellent". Reimplementing existing software from first principles is a great learning exercise, but it's not particularly relevant to the task of day-to-day software construction in most organisations. (Alternatively, if the answer the interviewer is looking for is "I wouldn't, I would use...", then it may be an unfair "Gotcha!" question, and those aren't cool either, since they expect the interviewee to be able to read the interviewer's mind, rather than just answering the specific question they were asked) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Nick, I have just checked all the links you posted, they are indeed very interesting and very efficient. However, I think those are very complicate in terms of installation and setup, and I still see a lot of usages for a multi-process scheduler. 2016-08-28 20:32 GMT-07:00 Nick Coghlan <ncoghlan@gmail.com>:

On 29 August 2016 at 15:53, Thales filizola costa <thalesfc@gmail.com> wrote:
Potentially, but one of the big challenges you'll face is to establish how it differs from using asyncio in the current process to manage tasks dispatched to other processes via run_in_executor, and when specifically it would be useful thing for a developer to have in the builtin toolkit (vs being something they can install from PyPI). Don't get me wrong, I think it's really cool that you were able to implement this - there's just a big gap between "implementing this was useful to me" and "this is sufficiently useful in a wide enough range of cases not otherwise addressed by the standard library that it should be added as a new standard application building block". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (4)
-
MRAB
-
Nick Coghlan
-
Sven R. Kunze
-
Thales filizola costa