Asyncio, implementing a new fair scheduling reactor
Hi guys, recently I've been trying to implement a POC of the default event loop implemented by asyncio but using a fair scheduling reactor. At the moment is just a POC [1], something to test the rationale and pending to be evolved in someone more mature, but before of that I would prefer to share my thoughts and get all of the comments from you. The current implementation is based on a FIFO queue that is filled with all of the callbacks that have to be executed, these callbacks can stand for: 1) Run tasks, either to be started or resumed 2) Run future callbacks. 3) Run scheduled callbacks. 4) Run file descriptors callbacks. Worth mentioning that most of them internally are chained in somehow, perhaps a future callback can wake up a resumed task. Also, have in mind that the API published by asyncio to schedule callbacks can be used by asyncio itself or by the user, callsoon for example. The usual flow the reactor is the following one: 1) Check the file descriptors with events, and stack the handlers into the reactor. 2) Pop all outdated scheduled callbacks and push them into the reactor. 3) Iterate for N first elements at the queue, where N stands for the number of the handles stacked at that moment. Future handles stacked during that iteration won't be handled, they must wait until next whole iteration 4) Go to the point 1. As you can observe here, the IO is only made once per loop and should wait until all handles that are in a specific moment are executed. This implements in somehow a natural backpressure, the read and also the accept the new connections will rely on the buffers run by the operating system. That implementation can be seen as simple, but it stands on a solid strategy and follows KISS design that helps to scare the bugs. Why fair scheduling? Not all code that is written in the same module, in terms of loop sharing, has the same requirements. Some part might need N and other parts M. When this implementation cant be decoupled, and it means that the cost of placing them into a separated pieces inside of your architecture are too expensive, in that scenario the developer cant express this difference to make the underlying implementation aware of that. For example, an API with a regular endpoint accessed by the user and another one with the health-check of the system, which has completely different requirements in terms of IO. Not only due to the nature of the resources accessed, also because of the frequency of use. Meanwhile, the healthcheck is accessed to a known a frequency at X seconds, the other endpoint has a variable frequency of use. Do you believe that asyncio will be able to preserve the health-check frequency at any moment? Absolutely not. Therefore, the idea of implementing a fair scheduling reactor is based on the needed of address these kind of situations, giving to the developer an interface to isolate different resources. Basic principles The basic principles of the implementation are: - The cost of the scheduling has to be the same of the current implementation, no overhead - The design has to follow the current one, having the implicit backpressure that was commented. I will focus in the second principle, taking into account that the first one is a matter of implementation. To achieve the same behavior, the new implementation only split the resources - handles, schedules, file descriptors - in isolated partitions to then implement for each partition the same algorithm than the current one. The developer can create a new partition using a new function called `spawn`, this function takes as an argument a coroutine, the task wrapped to that coroutine and all of the resources created inside this coroutine will belong to that partition. For example:
async def background_task(): task = [ fetch() for i in range(1000)] return (await asyncio.gather(*t))
async def foo(): return (await asyncio.spawn(background_task()))
All resources created inside the scope of the `background_tasks` are isolated to one partition. The 1000 sockets will schedule callbacks that will be stacked in the same queue. The partition is by default identified with the hash of the task that warps the `background_task`, but the user can pass an alternative value.
async def foo(): return (await asyncio.spawn(healtheck(), partition='healthcheck'))
Internally the implementation has a default ROOT partition that is used for all of these resources that are not executed inside of the scope of a spawn function. As you can guess, if you don't use the spawn method the reactor will run exactly as the current implementation. Having all the resources in the same queue. Round robin between partitions. The differents partitions that exist at some moment share the CPU resource using a round robin strategy. It gives the same chance to all partitions to run the same amount of handles, but with one particularity. Each time that a partition runs out of handles, the loop is restarted again to handle the file descriptors and the delayed calls but only for that specific partition that runs out of handles. The side effect is clear, have the same backpressure mechanism. But, per partition. The black hole of the current implementation. There is always a but, at least I've found a situation where this strategy can perform in the same way as the current one, without applying any fair scheduling. Although the code uses the spawn method. Have a look at the following snippet:
async def healtheck(request): await request.resp()
async def view(request): return (await asyncio.spawn(healthcheck(request)))
The task that wraps the healtcheck coroutine that is being isolated in a partition, won't be scheduled until the data from the file descriptor that is read by a callback that is in fact executed inside of the ROOT partition. Therefore, in the worst case scenario, the fair scheduling will become a simple FIFO scheduling. IMHO there is not an easy way to solve that issue, or at least without changing the current picture. And try to solve it, might end up having a messy implementation and a buggy code. Although, I believe that this still worth it, having in mind the benefits that it will bring us for all of those cases where the user needs to isolate resources. Thoughts, comments, and others will be welcomed. [1] https://github.com/pfreixes/qloop -- --pau
A few quick thoughts: You might find these notes interesting: https://github.com/python-trio/trio/issues/32 It sounds like a more precise description of your scheduler would be "hierarchical FIFO"? i.e., there's a top-level scheduler that selects between "child" schedulers in round-robin/FIFO fashion, and each child scheduler is round-robin/FIFO within its schedulable entities? [1] Generally I would think of a "fair" scheduler as one that notices when e.g. one task is blocking the event loop for a long time when it runs, and penalizing it by not letting it run as often. For your motivating example of a health-check: it sounds like another way to express your goal would be by attaching a static "priority level" to the health-check task(s), such that they get to run first whenever they're ready. Have you considered that as an alternative approach? But also... isn't part of the point of a healthcheck that it *should* get slow if the system is overloaded? -n [1] http://intronetworks.cs.luc.edu/current/html/queuing.html#hierarchical-queui... On Thu, Jun 15, 2017 at 2:40 PM, Pau Freixes <pfreixes@gmail.com> wrote:
Hi guys, recently I've been trying to implement a POC of the default event loop implemented by asyncio but using a fair scheduling reactor. At the moment is just a POC [1], something to test the rationale and pending to be evolved in someone more mature, but before of that I would prefer to share my thoughts and get all of the comments from you.
The current implementation is based on a FIFO queue that is filled with all of the callbacks that have to be executed, these callbacks can stand for:
1) Run tasks, either to be started or resumed 2) Run future callbacks. 3) Run scheduled callbacks. 4) Run file descriptors callbacks.
Worth mentioning that most of them internally are chained in somehow, perhaps a future callback can wake up a resumed task. Also, have in mind that the API published by asyncio to schedule callbacks can be used by asyncio itself or by the user, callsoon for example.
The usual flow the reactor is the following one:
1) Check the file descriptors with events, and stack the handlers into the reactor. 2) Pop all outdated scheduled callbacks and push them into the reactor. 3) Iterate for N first elements at the queue, where N stands for the number of the handles stacked at that moment. Future handles stacked during that iteration won't be handled, they must wait until next whole iteration 4) Go to the point 1.
As you can observe here, the IO is only made once per loop and should wait until all handles that are in a specific moment are executed.
This implements in somehow a natural backpressure, the read and also the accept the new connections will rely on the buffers run by the operating system.
That implementation can be seen as simple, but it stands on a solid strategy and follows KISS design that helps to scare the bugs.
Why fair scheduling?
Not all code that is written in the same module, in terms of loop sharing, has the same requirements. Some part might need N and other parts M. When this implementation cant be decoupled, and it means that the cost of placing them into a separated pieces inside of your architecture are too expensive, in that scenario the developer cant express this difference to make the underlying implementation aware of that.
For example, an API with a regular endpoint accessed by the user and another one with the health-check of the system, which has completely different requirements in terms of IO. Not only due to the nature of the resources accessed, also because of the frequency of use. Meanwhile, the healthcheck is accessed to a known a frequency at X seconds, the other endpoint has a variable frequency of use.
Do you believe that asyncio will be able to preserve the health-check frequency at any moment? Absolutely not.
Therefore, the idea of implementing a fair scheduling reactor is based on the needed of address these kind of situations, giving to the developer an interface to isolate different resources.
Basic principles
The basic principles of the implementation are:
- The cost of the scheduling has to be the same of the current implementation, no overhead - The design has to follow the current one, having the implicit backpressure that was commented.
I will focus in the second principle, taking into account that the first one is a matter of implementation.
To achieve the same behavior, the new implementation only split the resources - handles, schedules, file descriptors - in isolated partitions to then implement for each partition the same algorithm than the current one. The developer can create a new partition using a new function called `spawn`, this function takes as an argument a coroutine, the task wrapped to that coroutine and all of the resources created inside this coroutine will belong to that partition. For example:
async def background_task(): task = [ fetch() for i in range(1000)] return (await asyncio.gather(*t))
async def foo(): return (await asyncio.spawn(background_task()))
All resources created inside the scope of the `background_tasks` are isolated to one partition. The 1000 sockets will schedule callbacks that will be stacked in the same queue.
The partition is by default identified with the hash of the task that warps the `background_task`, but the user can pass an alternative value.
async def foo(): return (await asyncio.spawn(healtheck(), partition='healthcheck'))
Internally the implementation has a default ROOT partition that is used for all of these resources that are not executed inside of the scope of a spawn function. As you can guess, if you don't use the spawn method the reactor will run exactly as the current implementation. Having all the resources in the same queue.
Round robin between partitions.
The differents partitions that exist at some moment share the CPU resource using a round robin strategy. It gives the same chance to all partitions to run the same amount of handles, but with one particularity. Each time that a partition runs out of handles, the loop is restarted again to handle the file descriptors and the delayed calls but only for that specific partition that runs out of handles.
The side effect is clear, have the same backpressure mechanism. But, per partition.
The black hole of the current implementation.
There is always a but, at least I've found a situation where this strategy can perform in the same way as the current one, without applying any fair scheduling. Although the code uses the spawn method.
Have a look at the following snippet:
async def healtheck(request): await request.resp()
async def view(request): return (await asyncio.spawn(healthcheck(request)))
The task that wraps the healtcheck coroutine that is being isolated in a partition, won't be scheduled until the data from the file descriptor that is read by a callback that is in fact executed inside of the ROOT partition. Therefore, in the worst case scenario, the fair scheduling will become a simple FIFO scheduling.
IMHO there is not an easy way to solve that issue, or at least without changing the current picture. And try to solve it, might end up having a messy implementation and a buggy code.
Although, I believe that this still worth it, having in mind the benefits that it will bring us for all of those cases where the user needs to isolate resources.
Thoughts, comments, and others will be welcomed.
[1] https://github.com/pfreixes/qloop
-- --pau _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
-- Nathaniel J. Smith -- https://vorpus.org
Hi Nathaniel ... Let me share my thoughts regarding your response
It sounds like a more precise description of your scheduler would be "hierarchical FIFO"? i.e., there's a top-level scheduler that selects between "child" schedulers in round-robin/FIFO fashion, and each child scheduler is round-robin/FIFO within its schedulable entities? [1] Generally I would think of a "fair" scheduler as one that notices when e.g. one task is blocking the event loop for a long time when it runs, and penalizing it by not letting it run as often.
Agree with the controversial meaning of fair scheduling, I would rename my title as Fair Queuing [1]. Regarding your notes, pretty interesting, and your thoughts about how the current lack could be tackled in the near future. But, I would say that these notes are more about flow control at the network level, where mine was more oriented on business logic. I'm more inclined to think that both works at different level and different granularity, and are not excluding. The draft that I presented is keen on allow the user - in a way of pattern - to split and isolate resources from top to bottom. Yes, I was thinking about different strategies that might give more granularity and fine control to the user such as weighted partitions, or even have just two queues LOW and HIGH. But these last ones didn't appeal to me for the following reasons: - Give to the user the chance to mix partitions and weights might end up having some code difficult to understand. - The LOW and HIGH priority was too much restrictive. Give control to the user to configure what is LOW and what is HIGH might end up also having an overall performance worst than it was expected. The idea behind the Fair queuing is: Give enough control to the user but without taking the chance to screw up the whole thing.
But also... isn't part of the point of a healthcheck that it *should* get slow if the system is overloaded?
Not really, the response of a health-check is dichotomic: Yes or No. The problem on sharing resources between the health-check and the user flow is when the last one impacts on the first one having, as a result, false negatives. The dynamic allocation of resources to scale up horizontally to suit more traffic and reduce the pressure to the main flow is run by other actors, that of course can rely on metrics that are sent out by the user flow. [1] https://en.wikipedia.org/wiki/Fair_queuing -- --pau
Just a couple of thoughts: 1. A good place to start is Linux BFS (simple, correct, good performance). Typical test case is "make -j4", perhaps there's a way to simulate something similar? 2. There's a space to research async/await scheduling (in academic sense), older research may have been focused on .net, and current research probably on browsers and node, in any case javascript. d. On 15 June 2017 at 23:40, Pau Freixes <pfreixes@gmail.com> wrote:
Hi guys, recently I've been trying to implement a POC of the default event loop implemented by asyncio but using a fair scheduling reactor. At the moment is just a POC [1], something to test the rationale and pending to be evolved in someone more mature, but before of that I would prefer to share my thoughts and get all of the comments from you.
The current implementation is based on a FIFO queue that is filled with all of the callbacks that have to be executed, these callbacks can stand for:
1) Run tasks, either to be started or resumed 2) Run future callbacks. 3) Run scheduled callbacks. 4) Run file descriptors callbacks.
Worth mentioning that most of them internally are chained in somehow, perhaps a future callback can wake up a resumed task. Also, have in mind that the API published by asyncio to schedule callbacks can be used by asyncio itself or by the user, callsoon for example.
The usual flow the reactor is the following one:
1) Check the file descriptors with events, and stack the handlers into the reactor. 2) Pop all outdated scheduled callbacks and push them into the reactor. 3) Iterate for N first elements at the queue, where N stands for the number of the handles stacked at that moment. Future handles stacked during that iteration won't be handled, they must wait until next whole iteration 4) Go to the point 1.
As you can observe here, the IO is only made once per loop and should wait until all handles that are in a specific moment are executed.
This implements in somehow a natural backpressure, the read and also the accept the new connections will rely on the buffers run by the operating system.
That implementation can be seen as simple, but it stands on a solid strategy and follows KISS design that helps to scare the bugs.
Why fair scheduling?
Not all code that is written in the same module, in terms of loop sharing, has the same requirements. Some part might need N and other parts M. When this implementation cant be decoupled, and it means that the cost of placing them into a separated pieces inside of your architecture are too expensive, in that scenario the developer cant express this difference to make the underlying implementation aware of that.
For example, an API with a regular endpoint accessed by the user and another one with the health-check of the system, which has completely different requirements in terms of IO. Not only due to the nature of the resources accessed, also because of the frequency of use. Meanwhile, the healthcheck is accessed to a known a frequency at X seconds, the other endpoint has a variable frequency of use.
Do you believe that asyncio will be able to preserve the health-check frequency at any moment? Absolutely not.
Therefore, the idea of implementing a fair scheduling reactor is based on the needed of address these kind of situations, giving to the developer an interface to isolate different resources.
Basic principles
The basic principles of the implementation are:
- The cost of the scheduling has to be the same of the current implementation, no overhead - The design has to follow the current one, having the implicit backpressure that was commented.
I will focus in the second principle, taking into account that the first one is a matter of implementation.
To achieve the same behavior, the new implementation only split the resources - handles, schedules, file descriptors - in isolated partitions to then implement for each partition the same algorithm than the current one. The developer can create a new partition using a new function called `spawn`, this function takes as an argument a coroutine, the task wrapped to that coroutine and all of the resources created inside this coroutine will belong to that partition. For example:
async def background_task(): task = [ fetch() for i in range(1000)] return (await asyncio.gather(*t))
async def foo(): return (await asyncio.spawn(background_task()))
All resources created inside the scope of the `background_tasks` are isolated to one partition. The 1000 sockets will schedule callbacks that will be stacked in the same queue.
The partition is by default identified with the hash of the task that warps the `background_task`, but the user can pass an alternative value.
async def foo(): return (await asyncio.spawn(healtheck(), partition='healthcheck'))
Internally the implementation has a default ROOT partition that is used for all of these resources that are not executed inside of the scope of a spawn function. As you can guess, if you don't use the spawn method the reactor will run exactly as the current implementation. Having all the resources in the same queue.
Round robin between partitions.
The differents partitions that exist at some moment share the CPU resource using a round robin strategy. It gives the same chance to all partitions to run the same amount of handles, but with one particularity. Each time that a partition runs out of handles, the loop is restarted again to handle the file descriptors and the delayed calls but only for that specific partition that runs out of handles.
The side effect is clear, have the same backpressure mechanism. But, per partition.
The black hole of the current implementation.
There is always a but, at least I've found a situation where this strategy can perform in the same way as the current one, without applying any fair scheduling. Although the code uses the spawn method.
Have a look at the following snippet:
async def healtheck(request): await request.resp()
async def view(request): return (await asyncio.spawn(healthcheck(request)))
The task that wraps the healtcheck coroutine that is being isolated in a partition, won't be scheduled until the data from the file descriptor that is read by a callback that is in fact executed inside of the ROOT partition. Therefore, in the worst case scenario, the fair scheduling will become a simple FIFO scheduling.
IMHO there is not an easy way to solve that issue, or at least without changing the current picture. And try to solve it, might end up having a messy implementation and a buggy code.
Although, I believe that this still worth it, having in mind the benefits that it will bring us for all of those cases where the user needs to isolate resources.
Thoughts, comments, and others will be welcomed.
[1] https://github.com/pfreixes/qloop
-- --pau _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
participants (3)
-
Dima Tisnek
-
Nathaniel Smith
-
Pau Freixes