Re: [Async-sig] Async-sig Digest, Vol 14, Issue 5
I have one comment regarding what Nathaniel Smith said: """ Of course, if you really want an end to end measure you can do things like instrument your actual logic, see how fast you're replying to http requests or whatever, which is even more valid but creates complications because some requests are supposed to take longer than others, etc. """ you can't always use http requests or other metrics to measure how busy is your worker. Some examples of invalid metrics: - Your service can depend on external services that may be the ones making you slow. In this case, scaling up because a spike on your http response time doesn't help, its a waste of resources - You can't use metrics like CPU, memory, etc. Of course, they may mean something, but how do you know if its your worker who is using the CPU or any other random process triggered because admin connected manually, redis is also in the same machine, and a large of "please don't do that in PROD" list :) So, IMO how busy is the loop (I don't know what's the correct metric name here) is specific to that worker which will tell you that your service is dying because it is receiving too many asyncio.Tasks to be served inside what you consider a normal window time. For example, if you have an API where more than 1 second response time is not acceptable, if that loop metric is above 2 seconds (stable) you know you have to do something (scale up, improve something, etc). On Sat, Aug 12, 2017 at 6:03 PM <async-sig-request@python.org> wrote:
Send Async-sig mailing list submissions to async-sig@python.org
To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/async-sig or, via email, send a message with subject or body 'help' to async-sig-request@python.org
You can reach the person managing the list at async-sig-owner@python.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Async-sig digest..."
Today's Topics:
1. Re: Feedback, loop.load() function (Nathaniel Smith)
----------------------------------------------------------------------
Message: 1 Date: Fri, 11 Aug 2017 11:04:43 -0700 From: Nathaniel Smith <njs@pobox.com> To: Pau Freixes <pfreixes@gmail.com> Cc: async-sig@python.org Subject: Re: [Async-sig] Feedback, loop.load() function Message-ID: < CAPJVwBmEx7UzMtN6WGVpWvhbrOhnhzrv7EkOf3vLseiyRf6dbQ@mail.gmail.com> Content-Type: text/plain; charset="utf-8"
It looks like your "load average" is computing something very different than the traditional Unix "load average". If I'm reading right, yours is a measure of what percentage of the time the loop spent sleeping waiting for I/O, taken over the last 60 ticks of a 1 second timer (so generally slightly longer than 60 seconds). The traditional Unix load average is an exponentially weighted moving average of the length of the run queue.
Is one of those definitions better for your goal of detecting when to shed load? I don't know. But calling them the same thing is pretty confusing :-). The Unix version also has the nice property that it can actually go above 1; yours doesn't distinguish between a service whose load is at exactly 100% of capacity and barely keeping up, versus one that's at 200% of capacity and melting down. But for load shedding maybe you always want your tripwire to be below that anyway.
More broadly we might ask what's the best possible metric for this purpose ? how do we judge? A nice thing about the JavaScript library you mention is that scheduling delay is a real thing that directly impacts quality of service ? it's more of an "end to end" measure in a sense. Of course, if you really want an end to end measure you can do things like instrument your actual logic, see how fast you're replying to http requests or whatever, which is even more valid but creates complications because some requests are supposed to take longer than others, etc. I don't know which design goals are important for real operations.
On Aug 6, 2017 3:57 PM, "Pau Freixes" <pfreixes@gmail.com> wrote:
Hi guys,
I would appreciate any feedback about the idea of implementing a new load function to ask about how saturated is your reactor.
I have a proof of concept [1] of how the load function might be implemented in the Asyncio python loop.
The idea is to provide a method that can be used to ask about the load of the reactor in a specific time, this implementation returns the load taking into account the last 60 seconds but it can easily return the 5m and 15minutes ones u others.
This method can help services built on to of Asyncio to implement back pressure mechanisms that take into account a metric coming from the loop, instead of inferring the load using other metrics provided by external agents such as the CPU, load average u others.
Nowadays exists some alternatives for other languages that address this situation using the lag of a scheduler callback, produced by saturated reactors. The most known implementation is toobusy [2] a nodejs implementation.
IMHO the solution provided by tobusy has a strong dependency with the hardware needing to tune the maximum lag allowed in terms of milliseconds [3]. in the POF presented the user can use an exact value meaning the percentage of the load, perhaps 0.9
Any comment would be appreciated.
[1] https://github.com/pfreixes/cpython/commit/ 5fef3cae043abd62165ce40b181286e18f5fb19c [2] https://www.npmjs.com/package/toobusy [3] https://www.npmjs.com/package/toobusy#tunable-parameters -- --pau _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/
participants (1)
-
manuel miranda