[Async-sig] Blog post: Timeouts and cancellation for humans

Nick Badger nbadger1 at gmail.com
Mon Jan 15 01:08:26 EST 2018


Quick preface: there are definitely times when code "smell" really isn't --
nothing's perfect! -- and sometimes some system component is unavoidably
inelegant. I think this is oftentimes (but not always) the result of
scoping: clearly I couldn't decide, as a library author, that "it's all
just broken" and rip out everything from OS to TCP to language syntax and
semantics just to make my API prettier. So I pragmatically downscope the
problem space, and it forces me to make design decisions to accommodate the
rest of the universe. And that's okay!

With that being said, I'm still not convinced that the
double-timeout-shutdown isn't an indication of upstream code smell. From a
practical standpoint, for the purposes of this discussion it really doesn't
matter; Trio et al can't go mucking about in the TCP stack internals, so we
do the best we can. But I'm willing to entertain the possibility (actually
I think it's highly likely) that there are better solutions to the
aforementioned problems than the ones used by (for example) TCP and TLS.
But that rabbit hole goes very, very deep, so to circle back, what I'm
trying to say is this:

   - I share the inclination that shielding against cancellation (or any
   equivalent workaround) is likely code smell
   - However, I personally suspect the source of that smell is upstream, in
   the network protocols themselves
   - Given that, I think some amount of smell in downstream libraries like
   Trio is unavoidable

To that end, I really like Trio's existing approach. Shielding should
definitely be used sparingly, but I think it's a justifiable, pragmatic
compromise when it comes to dealing with not-quite-perfect protocols on
even-less-perfect networks. And I think the connection close semantics Trio
provides for these situations -- attempt to close gracefully, but if
cancelled, still close unilaterally to free local resources -- is an
excellent approach. But it also "lucks out" a bit, because freeing local
resources is many orders of magnitude faster than the enclosing timeout is
likely to be, so it's effectively a "free" operation. The relative
timescales are a critical observation; if freeing local resources took one
second out of a ten-second timeout, I think you'd be stuck asking the same
question there, too.




Nick Badger
https://www.nickbadger.com

2018-01-14 20:52 GMT-08:00 Nathaniel Smith <njs at pobox.com>:

> On Sun, Jan 14, 2018 at 2:45 PM, Nick Badger <nbadger1 at gmail.com> wrote:
> >> However, I think this is probably a code smell. Like all code smells,
> >> there are probably cases where it's the right thing to do, but when
> >> you see it you should stop and think carefully.
> >
> > Huh. That's a really good point. But I'm not sure the source of the
> smell is
> > the code that needs the shield logic -- I think this might instead be
> > indicative of upstream code smell. Put a bit more concretely: if you're
> > writing a protocol for an unreliable network (and of course, every
> network
> > is unreliable), requiring a closure operation to transmit something over
> > that network is inherently problematic, because it inevitably leads to
> > multiple-stage timeouts or ungraceful shutdowns.
>
> I wouldn't go that far -- there are actually good reasons to design
> protocols like this.
>
> SSL/TLS is a protocol that has a "goodbye" message (they call it
> "close-notify"). According to the spec [1], sending this is mandatory
> if you want to cleanly shut down an SSL/TLS connection. Why? Well, say
> I send you a message, "Should I buy more bitcoin?" and your reply is
> "Yes, but only if the price drops below $XX". Unbeknownst to us, we're
> being MITMed. Fortunately, we used SSL/TLS, so the MITM can't alter
> what we're saying. But they can manipulate the network; for example,
> they could cause our connection to drop after the first 3 bytes of
> your message, so your answer gets truncated and I think you just said
> "Yes" -- which is very different! But, close-notify saves us -- or at
> least contains the damage. Since I know that you're supposed to send a
> close-notify at the end of your connection, and I didn't get one, I
> can tell that this is a truncated message. I can't tell what the rest
> was going to be, but at least I know the message I got isn't the
> message you intended to send. And an attacker can't forge a
> close-notify message, because they're cryptographically authenticated
> like all the data we send.
>
> In websockets, the goodbye handshake is used to work around a nasty
> case that can happen with common TCP stacks (like, all of them):
>
> 1. A sends a message to B.
> 2. A is done after that, so it closes the connection.
> 3. Just then, B sends a message to A, like maybe a regular ping on some
> timer.
> 4. A's TCP stack receives data on a closed connection, goes "huh
> wut?", and sends a RST packet.
> 5. B goes to read the last message A sent before they closed the
> connection... but whoops it's gone! the RST packet caused both TCP
> stacks to wipe out all their buffered data associated with this
> connection.
>
> So if you have a protocol that's used for streaming indefinite amounts
> of data in both directions and supports stuff like pings, you kind of
> have to have a goodbye handshake to avoid TCP stacks accidentally
> corrupting your data. (The goodbye handshake can also help make sure
> that clients end up carrying CLOSE-WAIT states instead of servers, but
> that's a finicky and less important issue.)
>
> Of course, it is absolutely true that networks are unreliable, so when
> your protocol specifies a goodbye handshake like this then
> implementations still need to have some way to cope if their peer
> closes the connection unexpectedly, and they may need to unilaterally
> close the connection in some circumstances no matter what the spec
> says. Correctly handling every possible case here quickly becomes,
> like, infinitely complicated. But nonetheless, as a library author one
> has to try to provide some reasonable behavior by default (while
> knowing that some users will end up needing to tweak things to handle
> special circumstances).
>
> My tentative approach so far in Trio is (a) make cancellation stateful
> like discussed in the blog post, because accidentally hanging forever
> just can't be a good default, (b) in the "trio.abc.AsyncResource"
> interface that complex objects like trio.SSLStream implement (and we
> recommend libraries implement too), the semantics for the aclose and
> __aexit__ methods are that they're allowed to block forever trying to
> do a graceful shutdown, but if cancelled then they have to return
> promptly *but still freeing any underlying resources*, possibly in a
> non-graceful way. So if you write straightforward code like:
>
> with trio.move_on_after(10):
>     async with open_websocket_connection(...):
>         ...
>
> then it tries to do a proper websocket goodbye handshake by default,
> but if the timeout expires then it gives up and immediately closes the
> socket. It's not perfect, but it seems like a better default than
> anything else I can think of.
>
> -n
>
> [1] There's also this whole mess where many SSL/TLS implementations
> ignore the spec and don't bother sending close-notify. This is *kinda*
> justifiable because the original and most popular use for SSL/TLS is
> for wrapping HTTP connections, and HTTP has its own ways of signaling
> the end of the connection that are already transmitted through the
> encrypted tunnel, so the SSL/TLS end-of-connection handshake is
> redundant. Therefore lots of implementations went ahead and ignored
> the spec (including Python's ssl module!), so now if you're
> implementing HTTPS you have to do the same for interoperability. But
> the SSL/TLS spec can't assume you're using HTTP on top: it's contract
> is basically "socket semantics, but cryptographically authenticated".
> And close() is part of socket semantics, so it kind of has to make
> close() cryptographically authenticated too. (trio.SSLStream handles
> this by implementing the standard compliant behavior by default, but
> you can pass https_compatible=True to the constructor to get the
> HTTPS-style behavior.)
>
> --
> Nathaniel J. Smith -- https://vorpus.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/async-sig/attachments/20180114/c6af70ba/attachment-0001.html>


More information about the Async-sig mailing list