[Async-sig] Blog post: Timeouts and cancellation for humans

Nathaniel Smith njs at pobox.com
Sun Jan 14 06:33:44 EST 2018


On Fri, Jan 12, 2018 at 4:17 AM, Chris Jerdonek
<chris.jerdonek at gmail.com> wrote:
> Thanks, Nathaniel. Very instructive, thought-provoking write-up!
>
> One thing occurred to me around the time of reading this passage:
>
>> "Once the cancel token is triggered, then all future operations on that token are cancelled, so the call to ws.close doesn't get stuck. It's a less error-prone paradigm. ... If you follow the path we did in this blog post, and start by thinking about applying a timeout to a complex operation composed out of multiple blocking calls, then it's obvious that if the first call uses up the whole timeout budget, then any future calls should fail immediately."
>
> One case that's not clear how should be addressed is the following.
> It's something I've wrestled with in the context of asyncio, and it
> doesn't seem to be raised as a possibility in your write-up.
>
> Say you have a complex operation that you want to be able to timeout
> or cancel, but the process of cleanup / cancelling might also require
> a certain amount of time that you'd want to allow time for (likely a
> smaller time in normal circumstances). Then it seems like you'd want
> to be able to allocate a separate timeout for the clean-up portion
> (independent of the timeout allotted for the original operation).
>
> It's not clear to me how this case would best be handled with the
> primitives you described. In your text above ("then any future calls
> should fail immediately"), without any changes, it seems there
> wouldn't be "time" for any clean-up to complete.
>
> With asyncio, one way to handle this is to await on a task with a
> smaller timeout after calling task.cancel(). That lets you assign a
> different timeout to waiting for cancellation to complete.

You can get these semantics using the "shielding" feature, which the
post discusses a bit later:

try:
    await do_some_stuff()
finally:
    # Always give this 30 seconds to clean up, even if we've
    # been cancelled
    with trio.move_on_after(30) as cscope:
        cscope.shield = True
        await do_cleanup()

Here the inner scope "hides" the code inside it from any external
cancel scopes, so it can continue executing even of the overall
context has been cancelled.

However, I think this is probably a code smell. Like all code smells,
there are probably cases where it's the right thing to do, but when
you see it you should stop and think carefully. If you're writing code
like this, then it means that there are multiple different layers in
your code that are implementing timeout policies, that might end up
fighting with each other. What if the caller really needs this to
finish in 15 seconds? So if you have some way to move the timeout
handling into the same layer, then I suspect that will make your
program easier to understand and maintain. OTOH, if you decide you
want it, the code above works :-). I'm not 100% sure here; I'd
definitely be interested to hear about more use cases.

One thing I've thought about that might help is adding a kind of "soft
cancelled" state to the cancel scopes, inspired by the "graceful
shutdown" mode that you'll often see in servers where you stop
accepting new connections, then try to finish up old ones (with some
time limit). So in this case you might mark 'do_some_stuff()' as being
cancelled immediately when we entered the 'soft cancel' phase, but let
the 'do_cleanup' code keep running until the grace period expired and
the region was hard-cancelled. This idea isn't fully baked yet though.
(There's some more mumbling about this at
https://github.com/python-trio/trio/issues/147.)

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the Async-sig mailing list