[Async-sig] Blog post: Timeouts and cancellation for humans

Sun Jan 14 23:52:19 EST 2018

On Sun, Jan 14, 2018 at 2:45 PM, Nick Badger <nbadger1 at gmail.com> wrote:
>> However, I think this is probably a code smell. Like all code smells,
>> there are probably cases where it's the right thing to do, but when
>> you see it you should stop and think carefully.
>
> Huh. That's a really good point. But I'm not sure the source of the smell is
> the code that needs the shield logic -- I think this might instead be
> indicative of upstream code smell. Put a bit more concretely: if you're
> writing a protocol for an unreliable network (and of course, every network
> is unreliable), requiring a closure operation to transmit something over
> that network is inherently problematic, because it inevitably leads to
> multiple-stage timeouts or ungraceful shutdowns.

I wouldn't go that far -- there are actually good reasons to design
protocols like this.

SSL/TLS is a protocol that has a "goodbye" message (they call it
"close-notify"). According to the spec [1], sending this is mandatory
if you want to cleanly shut down an SSL/TLS connection. Why? Well, say
I send you a message, "Should I buy more bitcoin?" and your reply is
"Yes, but only if the price drops below $XX". Unbeknownst to us, we're
being MITMed. Fortunately, we used SSL/TLS, so the MITM can't alter
what we're saying. But they can manipulate the network; for example,
they could cause our connection to drop after the first 3 bytes of
your message, so your answer gets truncated and I think you just said
"Yes" -- which is very different! But, close-notify saves us -- or at
least contains the damage. Since I know that you're supposed to send a
close-notify at the end of your connection, and I didn't get one, I
can tell that this is a truncated message. I can't tell what the rest
was going to be, but at least I know the message I got isn't the
message you intended to send. And an attacker can't forge a
close-notify message, because they're cryptographically authenticated
like all the data we send.

In websockets, the goodbye handshake is used to work around a nasty
case that can happen with common TCP stacks (like, all of them):

1. A sends a message to B.
2. A is done after that, so it closes the connection.
3. Just then, B sends a message to A, like maybe a regular ping on some timer.
4. A's TCP stack receives data on a closed connection, goes "huh
wut?", and sends a RST packet.
5. B goes to read the last message A sent before they closed the
connection... but whoops it's gone! the RST packet caused both TCP
stacks to wipe out all their buffered data associated with this
connection.

So if you have a protocol that's used for streaming indefinite amounts
of data in both directions and supports stuff like pings, you kind of
have to have a goodbye handshake to avoid TCP stacks accidentally
corrupting your data. (The goodbye handshake can also help make sure
that clients end up carrying CLOSE-WAIT states instead of servers, but
that's a finicky and less important issue.)

Of course, it is absolutely true that networks are unreliable, so when
your protocol specifies a goodbye handshake like this then
implementations still need to have some way to cope if their peer
closes the connection unexpectedly, and they may need to unilaterally
close the connection in some circumstances no matter what the spec
says. Correctly handling every possible case here quickly becomes,
like, infinitely complicated. But nonetheless, as a library author one
has to try to provide some reasonable behavior by default (while
knowing that some users will end up needing to tweak things to handle
special circumstances).

My tentative approach so far in Trio is (a) make cancellation stateful
like discussed in the blog post, because accidentally hanging forever
just can't be a good default, (b) in the "trio.abc.AsyncResource"
interface that complex objects like trio.SSLStream implement (and we
recommend libraries implement too), the semantics for the aclose and
__aexit__ methods are that they're allowed to block forever trying to
do a graceful shutdown, but if cancelled then they have to return
promptly *but still freeing any underlying resources*, possibly in a
non-graceful way. So if you write straightforward code like:

with trio.move_on_after(10):
    async with open_websocket_connection(...):
        ...

then it tries to do a proper websocket goodbye handshake by default,
but if the timeout expires then it gives up and immediately closes the
socket. It's not perfect, but it seems like a better default than
anything else I can think of.

-n

[1] There's also this whole mess where many SSL/TLS implementations
ignore the spec and don't bother sending close-notify. This is *kinda*
justifiable because the original and most popular use for SSL/TLS is
for wrapping HTTP connections, and HTTP has its own ways of signaling
the end of the connection that are already transmitted through the
encrypted tunnel, so the SSL/TLS end-of-connection handshake is
redundant. Therefore lots of implementations went ahead and ignored
the spec (including Python's ssl module!), so now if you're
implementing HTTPS you have to do the same for interoperability. But
the SSL/TLS spec can't assume you're using HTTP on top: it's contract
is basically "socket semantics, but cryptographically authenticated".
And close() is part of socket semantics, so it kind of has to make
close() cryptographically authenticated too. (trio.SSLStream handles
this by implementing the standard compliant behavior by default, but
you can pass https_compatible=True to the constructor to get the
HTTPS-style behavior.)

-- 
Nathaniel J. Smith -- https://vorpus.org