[New-bugs-announce] [issue43577] Deadlock when using SSLContext._msg_callback and SSLContext.sni_callback

Andrew Dailey report at bugs.python.org
Sun Mar 21 01:00:42 EDT 2021


New submission from Andrew Dailey <steveday168 at gmail.com>:

Hello,

I think I might've stumbled onto an oversight with how an SSLSocket handles overwriting its SSLContext within an sni_callback. If both "_msg_callback" and "sni_callback" are defined on an SSLContext object and the sni_callback replaces the context with new one, the interpreter locks up indefinitely. It fails to respond to keyboard interrupts and must be forcefully killed.

This seems to be a common use case of the sni_callback: create a new context with a different cert chain and attach it to the current socket (which replaces the existing one). If _msg_callback never gets defined on the original context then this deadlock never occurs. Curiously, if you assign the same _msg_callback to the new context before replacement, this also avoids the deadlock.

I've attached as minimal of a reproduction as I could come up with. I think the code within will probably do a better job explaining this problem than I've done here in prose. I've only tested it on a couple Linux distros (Ubuntu Server and Void Linux) but the lock occurs 100% of the time in my experience.

In the brief time I've spent digging into the CPython source, I've come to understand that replacing the SSLContext on an SSLSocket isn't "just" a simple replacement but actually involves some OpenSSL mechanics (specifically, SSL_set_SSL_CTX) [0]. I'm wondering if maybe this context update routine isn't properly cleaning up whatever resources / references were being used by the msg_callback? Maybe this is even closer to an OpenSSL bug (or a least a gotcha)?

I also feel the need to explain why I'd even be using an undocumented property (SSLContext._msg_callback) in the first place. I'm trying to implement a program that automatically manages TLS certs on a socket via Let's Encrypt and the ACME protocol (RFC8555). Part of this process involves serving up a specific cert when a connection requests the acme-tls/1 ALPN protocol. Given the existing Python SSL API, I don't believe there is any way for me to do this "correctly".

The documentation for SSLContext.sni_callback [1] mentions that the selected_alpn_protocol function should be usable within the callback but I don't that is quite true. According to the OpenSSL docs [2]:
Several callbacks are executed during ClientHello processing, including the ClientHello, ALPN, and servername callbacks. The ClientHello callback is executed first, then the servername callback, followed by the ALPN callback.

If there is a better way for me to identify a specific ALPN protocol _before_ the sni_callback, I could definitely use the guidance. That would avoid this deadlock altogether (even though it'd still be waiting to catch someone else...).

This is my first Python issue so I hope what I've supplied makes sense. If there is anything more I can do to help or provide more info, please let me know.

[0] https://github.com/python/cpython/blob/3.9/Modules/_ssl.c#L2194
[1] https://docs.python.org/3/library/ssl.html#ssl.SSLContext.sni_callback
[2] https://www.openssl.org/docs/man1.1.1/man3/SSL_CTX_set_tlsext_servername_callback.html

----------
assignee: christian.heimes
components: SSL
files: deadlock.zip
messages: 389216
nosy: christian.heimes, theandrew168
priority: normal
severity: normal
status: open
title: Deadlock when using SSLContext._msg_callback and SSLContext.sni_callback
type: behavior
versions: Python 3.8, Python 3.9
Added file: https://bugs.python.org/file49897/deadlock.zip

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43577>
_______________________________________


More information about the New-bugs-announce mailing list