Hi guys,
I would appreciate any feedback about the idea of implementing a new
load function to ask about how saturated is your reactor.
I have a proof of concept [1] of how the load function might be
implemented in the Asyncio python loop.
The idea is to provide a method that can be used to ask about the load
of the reactor in a specific time, this implementation returns the
load taking into account the last 60 seconds but it can easily return
the 5m and 15minutes ones u others.
This method can help services built on to of Asyncio to implement back
pressure mechanisms that take into account a metric coming from the
loop, instead of inferring the load using other metrics provided by
external agents such as the CPU, load average u others.
Nowadays exists some alternatives for other languages that address
this situation using the lag of a scheduler callback, produced by
saturated reactors. The most known implementation is toobusy [2] a
nodejs implementation.
IMHO the solution provided by tobusy has a strong dependency with the
hardware needing to tune the maximum lag allowed in terms of
milliseconds [3]. in the POF presented the user can use an exact value
meaning the percentage of the load, perhaps 0.9
Any comment would be appreciated.
[1] https://github.com/pfreixes/cpython/commit/5fef3cae043abd62165ce40b181286e1…
[2] https://www.npmjs.com/package/toobusy
[3] https://www.npmjs.com/package/toobusy#tunable-parameters
--
--pau
I have one comment regarding what Nathaniel Smith said:
"""
Of course, if
you really want an end to end measure you can do things like instrument
your actual logic, see how fast you're replying to http requests or
whatever, which is even more valid but creates complications because some
requests are supposed to take longer than others, etc.
"""
you can't always use http requests or other metrics to measure how busy is
your worker. Some examples of invalid metrics:
- Your service can depend on external services that may be the ones making
you slow. In this case, scaling up because a spike on your http response
time doesn't help, its a waste of resources
- You can't use metrics like CPU, memory, etc. Of course, they may mean
something, but how do you know if its your worker who is using the CPU or
any other random process triggered because admin connected manually, redis
is also in the same machine, and a large of "please don't do that in PROD"
list :)
So, IMO how busy is the loop (I don't know what's the correct metric name
here) is specific to that worker which will tell you that your service is
dying because it is receiving too many asyncio.Tasks to be served inside
what you consider a normal window time. For example, if you have an API
where more than 1 second response time is not acceptable, if that loop
metric is above 2 seconds (stable) you know you have to do something (scale
up, improve something, etc).
On Sat, Aug 12, 2017 at 6:03 PM <async-sig-request(a)python.org> wrote:
> Send Async-sig mailing list submissions to
> async-sig(a)python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.python.org/mailman/listinfo/async-sig
> or, via email, send a message with subject or body 'help' to
> async-sig-request(a)python.org
>
> You can reach the person managing the list at
> async-sig-owner(a)python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Async-sig digest..."
>
>
> Today's Topics:
>
> 1. Re: Feedback, loop.load() function (Nathaniel Smith)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 11 Aug 2017 11:04:43 -0700
> From: Nathaniel Smith <njs(a)pobox.com>
> To: Pau Freixes <pfreixes(a)gmail.com>
> Cc: async-sig(a)python.org
> Subject: Re: [Async-sig] Feedback, loop.load() function
> Message-ID:
> <
> CAPJVwBmEx7UzMtN6WGVpWvhbrOhnhzrv7EkOf3vLseiyRf6dbQ(a)mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> It looks like your "load average" is computing something very different
> than the traditional Unix "load average". If I'm reading right, yours is a
> measure of what percentage of the time the loop spent sleeping waiting for
> I/O, taken over the last 60 ticks of a 1 second timer (so generally
> slightly longer than 60 seconds). The traditional Unix load average is an
> exponentially weighted moving average of the length of the run queue.
>
> Is one of those definitions better for your goal of detecting when to shed
> load? I don't know. But calling them the same thing is pretty confusing
> :-). The Unix version also has the nice property that it can actually go
> above 1; yours doesn't distinguish between a service whose load is at
> exactly 100% of capacity and barely keeping up, versus one that's at 200%
> of capacity and melting down. But for load shedding maybe you always want
> your tripwire to be below that anyway.
>
> More broadly we might ask what's the best possible metric for this purpose
> ? how do we judge? A nice thing about the JavaScript library you mention is
> that scheduling delay is a real thing that directly impacts quality of
> service ? it's more of an "end to end" measure in a sense. Of course, if
> you really want an end to end measure you can do things like instrument
> your actual logic, see how fast you're replying to http requests or
> whatever, which is even more valid but creates complications because some
> requests are supposed to take longer than others, etc. I don't know which
> design goals are important for real operations.
>
> On Aug 6, 2017 3:57 PM, "Pau Freixes" <pfreixes(a)gmail.com> wrote:
>
> > Hi guys,
> >
> > I would appreciate any feedback about the idea of implementing a new
> > load function to ask about how saturated is your reactor.
> >
> > I have a proof of concept [1] of how the load function might be
> > implemented in the Asyncio python loop.
> >
> > The idea is to provide a method that can be used to ask about the load
> > of the reactor in a specific time, this implementation returns the
> > load taking into account the last 60 seconds but it can easily return
> > the 5m and 15minutes ones u others.
> >
> > This method can help services built on to of Asyncio to implement back
> > pressure mechanisms that take into account a metric coming from the
> > loop, instead of inferring the load using other metrics provided by
> > external agents such as the CPU, load average u others.
> >
> > Nowadays exists some alternatives for other languages that address
> > this situation using the lag of a scheduler callback, produced by
> > saturated reactors. The most known implementation is toobusy [2] a
> > nodejs implementation.
> >
> > IMHO the solution provided by tobusy has a strong dependency with the
> > hardware needing to tune the maximum lag allowed in terms of
> > milliseconds [3]. in the POF presented the user can use an exact value
> > meaning the percentage of the load, perhaps 0.9
> >
> > Any comment would be appreciated.
> >
> > [1] https://github.com/pfreixes/cpython/commit/
> > 5fef3cae043abd62165ce40b181286e18f5fb19c
> > [2] https://www.npmjs.com/package/toobusy
> > [3] https://www.npmjs.com/package/toobusy#tunable-parameters
> > --
> > --pau
> > _______________________________________________
> > Async-sig mailing list
> > Async-sig(a)python.org
> > https://mail.python.org/mailman/listinfo/async-sig
> > Code of Conduct: https://www.python.org/psf/codeofconduct/
> >
>
I want to share a pattern I came up with for handling interrupt
signals in asyncio to see if you had any feedback (ways to make it
easier, similar approaches, etc).
I wanted something that was easy to check and reason about. I'm
already familiar with some of the pitfalls in handling signals, for
example as described in Nathaniel's Control-C blog post announced
here:
https://mail.python.org/pipermail/async-sig/2017-April/thread.html
The basic idea is to create a Future to run alongside the main
coroutine whose only purpose is to "catch" the signal. And then call--
asyncio.wait(futures, return_when=asyncio.FIRST_COMPLETED)
When a signal is received, both tasks stop, and then you have access
to the main task (which will be pending) for things like cleanup and
inspection.
One advantage of this approach is that it lets you put all your
cleanup logic in the main program instead of putting some of it in the
signal handler. You also don't need to worry about things like
handling KeyboardInterrupt at arbitrary points in your code.
I'm including the code at bottom.
On the topic of asyncio.run() that I mentioned in an earlier email
[1], it doesn't look like the run() API posted in PR #465 [2] has
hooks to support what I'm describing (but I could be wrong). So maybe
this is another use case that the future API should contemplate.
--Chris
[1] https://mail.python.org/pipermail/async-sig/2017-August/000373.html
[2] https://github.com/python/asyncio/pull/465
import asyncio
import io
import signal
def _cleanup(loop):
try:
loop.run_until_complete(loop.shutdown_asyncgens())
finally:
loop.close()
def handle_sigint(future):
future.set_result(signal.SIGINT)
async def run():
print('running...')
await asyncio.sleep(1000000)
def get_message(sig, task):
stream = io.StringIO()
task.print_stack(file=stream)
traceback = stream.getvalue()
return f'interrupted by {sig.name}:\n{traceback}'
def main(coro):
loop = asyncio.new_event_loop()
try:
# This is made truthy if the loop is interrupted by a signal.
interrupted = []
future = asyncio.Future(loop=loop)
future.add_done_callback(lambda future: interrupted.append(1))
loop.add_signal_handler(signal.SIGINT, handle_sigint, future)
futures = [future, coro]
future = asyncio.wait(futures, return_when=asyncio.FIRST_COMPLETED)
done, pending = loop.run_until_complete(future)
if interrupted:
# Do whatever cleanup you want here and/or get the stacktrace
# of the interrupted main task.
sig = done.pop().result()
task = pending.pop()
msg = get_message(sig, task)
task.cancel()
raise KeyboardInterrupt(msg)
finally:
_cleanup(loop)
main(run())
Below is what the code above outputs if you run it and then press Control-C:
running...
^CTraceback (most recent call last):
File "test-signal.py", line 54, in <module>
main(run())
File "test-signal.py", line 49, in main
raise KeyboardInterrupt(msg)
KeyboardInterrupt: interrupted by SIGINT:
Stack for <Task pending coro=<run() running at test-signal.py:17>
wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at
0x10fe9b9a8>()]>> (most recent call last):
File "test-signal.py", line 17, in run
await asyncio.sleep(1000000)
I have a question about PEP 525 (Asynchronous Generators) which I'm
sure has a simple answer, but I didn't see it in the PEP or final
discussion:
https://mail.python.org/pipermail/python-dev/2016-September/146265.html
Basically, why is the API such that loop.shutdown_asyncgens() must be
called manually? For example, why can't it be called automatically as
part of close(), which seems like it would be a friendlier API and
more helpful to the common case?
I was trying asynchronous iterators in my code and getting the following error:
Exception ignored in: <generator object Queue.get at 0x7f950a667678>
Traceback (most recent call last):
File "/usr/local/lib/python3.6/asyncio/queues.py", line 169, in get
getter.cancel() # Just in case getter is not done yet.
File "/usr/local/lib/python3.6/asyncio/base_events.py", line
574, in call_soon
self._check_closed()
File "/usr/local/lib/python3.6/asyncio/base_events.py", line
357, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Calling loop.shutdown_asyncgens() made the error go away, but it seems
a little obscure that by adding an asynchronous iterator somewhere in
your code, you have to remember to check that that line is present
before loop.close() is called (and the exception message doesn't
provide a good hint).
Is there any disadvantage to always calling loop.shutdown_asyncgens()
(i.e. even if it's not needed)? And why might someone need to call it
at a different time?
Thanks,
--Chris