[Twisted-Python] startTLS errors not propagating to Factory
![](https://secure.gravatar.com/avatar/e5a514e14e44913930aa1ac15f508746.jpg?s=120&d=mm&r=g)
Hi folks, I've been investigating https://github.com/matrix-org/synapse/issues/9566, which amounts to: "when there is a TLS error connecting to the SMTP server, the resultant exception is unreadable". I think I've traced the problem to the fact that SMTPSenderFactory.clientConnectionFailed is being called with an unhelpful ConnectionAborted rather than anything more descriptive. I've then reproduced that with a simpler test case: see https://gist.github.com/richvdh/909761ff5dab23f0873eeddd7936a740. As you can see, the output is: "Factory lost connection. Reason: Connection was aborted locally using ITCPTransport.abortConnection." This seems to be thanks to TLSMemoryBIOProtocol.failVerification, which stashes the error and calls abortConnection(): https://github.com/twisted/twisted/blob/trunk/src/twisted/protocols/tls.py#L.... At this point I'm struggling. Is the SMTP code holding the Factory wrong? Or is it reasonable to expect the verification error to propagate into clientConnectionFailed - in which case, how could this work? Thanks for your thoughts! Richard
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
Hi Richard, Sorry for the delayed response here. This is a bug in Twisted, and I think it boils down to this line: https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f0085412... <https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f0085412...> The code as written here appears to be expecting this sequence: failVerification is called with a reason containing a helpful OpenSSL verification error we save that reason as `self._reason` for reporting later, calling abortConnection() since the connection got aborted, we expect our underlying transport to call loseConnection on us we will then get a falsey reason [?!?!] and as such we will use self._reason instead Assumption 4 is nonsense and has never been true within Twisted as far as I know; connectionLost always gets a Failure, Failures are never falsey, so we will never use self._reason. To fix this we need to actually honor self._reason under the conditions we expect, i.e. it's a ConnectionAborted https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f0085412... <https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f0085412...> Does this make sense? (Will you be able to file / fix this bug?) Thanks for the report, -g
![](https://secure.gravatar.com/avatar/e5a514e14e44913930aa1ac15f508746.jpg?s=120&d=mm&r=g)
On 28/04/2021 07:06, Glyph wrote:
Thanks very much for your thoughts on this, Glyph - it's always helpful to have an insight into the intended design when trying to resolve this sort of thing. I don't follow your reasoning though. I think you might have misread the line you point to (https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f0085412...). It is "self._reason or reason" - ie, if self._reason is already set, it will take precedence over reason. In my tests at least, TLSMemoryBIOProtocol.connectionLost is doing exactly the right thing - it is called with an unhelpful reason, but substitutes back in the helpful reason which has already been stashed. Rather, the problem, as I see it, is that it's not TLSMemoryBIOProtocol.connectionLost that calls Factory.clientConnectionLost. That is done by tcp.Client.connectionLost, via one of tcp.Client's myriad of base classes, at https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f0085412.... Of course, that doesn't get the benefit of TLSMemoryBIOProtocol's reason switcheroo. I'm still not quite sure who is in the wrong here. Cheers Richard PS: yes, once we figure out what's going wrong here, I'll at least write up an issue...
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
> On Apr 28, 2021, at 2:43 PM, Richard van der Hoff <richard@matrix.org> wrote: > > On 28/04/2021 07:06, Glyph wrote: >>> Is the SMTP code holding the Factory wrong? Or is it reasonable to expect the verification error to propagate into clientConnectionFailed - in which case, how could this work? >>> >>> Thanks for your thoughts! >> Hi Richard, >> Sorry for the delayed response here. >> This is a bug in Twisted, and I think it boils down to this line: https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f00854128957/src/twisted/protocols/tls.py#L391 The code as written here appears to be expecting this sequence: >> 1. failVerification is called with a reason containing a helpful >> OpenSSL verification error >> 2. we save that reason as `self._reason` for reporting later, calling >> abortConnection() >> 3. since the connection got aborted, we expect our underlying transport >> to call loseConnection on us >> 4. we will then get a falsey reason [?!?!] and as such we will use >> self._reason instead >> Assumption 4 is nonsense and has never been true within Twisted as far as I know; connectionLost always gets a Failure, Failures are never falsey, so we will never use self._reason. To fix this we need to actually honor self._reason under the conditions we expect, i.e. it's a ConnectionAborted https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f00854128957/src/twisted/internet/error.py#L209 > > Thanks very much for your thoughts on this, Glyph - it's always helpful to have an insight into the intended design when trying to resolve this sort of thing. > > I don't follow your reasoning though. I think you might have misread the line you point to (https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f00854128957/src/twisted/protocols/tls.py#L391). It is "self._reason or reason" - ie, if self._reason is already set, it will take precedence over reason. Sigh. You're right, I read it backwards. > In my tests at least, TLSMemoryBIOProtocol.connectionLost is doing exactly the right thing - it is called with an unhelpful reason, but substitutes back in the helpful reason which has already been stashed. > > Rather, the problem, as I see it, is that it's not TLSMemoryBIOProtocol.connectionLost that calls Factory.clientConnectionLost. That is done by tcp.Client.connectionLost, via one of tcp.Client's myriad of base classes, at https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f00854128957/src/twisted/internet/tcp.py#L508. Of course, that doesn't get the benefit of TLSMemoryBIOProtocol's reason switcheroo. > > I'm still not quite sure who is in the wrong here. Aah, yeah, this is a weird quirk of the ancient-style layering in the SMTP code :-|. The way this should work is by using HostnameEndpoint. I'm not sure exactly where we're going off the rails, but by using both the old 'startTLS' style of starting a TLS connection, as well as relying on ClientFactory rather than an Endpoint of some kind, means that we're getting this duplicate notification; the one that you get to Protocol.connectionLost will come from TLS and have useful information, but the one that goes to the Connector will be coming straight from TCP. The right thing to fix here, I think, is to ignore clientConnectionLost entirely, and instead to have the protocol object relay its failure to some other differently-named method on SMTPSenderFactory. -g > > Cheers > > Richard > > > PS: yes, once we figure out what's going wrong here, I'll at least write up an issue... > > _______________________________________________ > Twisted-Python mailing list > Twisted-Python@twistedmatrix.com > https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
On Apr 29, 2021, at 3:09 AM, Richard van der Hoff <richard@matrix.org> wrote: Right! That sounds plausible, and certainly gives me some places to poke. I'll have another look later. Thanks very much!
Glad to help. Looking forward to the resolution on this! For what it's worth: it may also be useful to explicitly account for the Connector object in the transport.startTLS subsystem, it would be good to have the same reason object delivered in both places. So fixing the SMTP code isn't necessarily the only path forward here. -g
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
Hi Richard, Sorry for the delayed response here. This is a bug in Twisted, and I think it boils down to this line: https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f0085412... <https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f0085412...> The code as written here appears to be expecting this sequence: failVerification is called with a reason containing a helpful OpenSSL verification error we save that reason as `self._reason` for reporting later, calling abortConnection() since the connection got aborted, we expect our underlying transport to call loseConnection on us we will then get a falsey reason [?!?!] and as such we will use self._reason instead Assumption 4 is nonsense and has never been true within Twisted as far as I know; connectionLost always gets a Failure, Failures are never falsey, so we will never use self._reason. To fix this we need to actually honor self._reason under the conditions we expect, i.e. it's a ConnectionAborted https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f0085412... <https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f0085412...> Does this make sense? (Will you be able to file / fix this bug?) Thanks for the report, -g
![](https://secure.gravatar.com/avatar/e5a514e14e44913930aa1ac15f508746.jpg?s=120&d=mm&r=g)
On 28/04/2021 07:06, Glyph wrote:
Thanks very much for your thoughts on this, Glyph - it's always helpful to have an insight into the intended design when trying to resolve this sort of thing. I don't follow your reasoning though. I think you might have misread the line you point to (https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f0085412...). It is "self._reason or reason" - ie, if self._reason is already set, it will take precedence over reason. In my tests at least, TLSMemoryBIOProtocol.connectionLost is doing exactly the right thing - it is called with an unhelpful reason, but substitutes back in the helpful reason which has already been stashed. Rather, the problem, as I see it, is that it's not TLSMemoryBIOProtocol.connectionLost that calls Factory.clientConnectionLost. That is done by tcp.Client.connectionLost, via one of tcp.Client's myriad of base classes, at https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f0085412.... Of course, that doesn't get the benefit of TLSMemoryBIOProtocol's reason switcheroo. I'm still not quite sure who is in the wrong here. Cheers Richard PS: yes, once we figure out what's going wrong here, I'll at least write up an issue...
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
> On Apr 28, 2021, at 2:43 PM, Richard van der Hoff <richard@matrix.org> wrote: > > On 28/04/2021 07:06, Glyph wrote: >>> Is the SMTP code holding the Factory wrong? Or is it reasonable to expect the verification error to propagate into clientConnectionFailed - in which case, how could this work? >>> >>> Thanks for your thoughts! >> Hi Richard, >> Sorry for the delayed response here. >> This is a bug in Twisted, and I think it boils down to this line: https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f00854128957/src/twisted/protocols/tls.py#L391 The code as written here appears to be expecting this sequence: >> 1. failVerification is called with a reason containing a helpful >> OpenSSL verification error >> 2. we save that reason as `self._reason` for reporting later, calling >> abortConnection() >> 3. since the connection got aborted, we expect our underlying transport >> to call loseConnection on us >> 4. we will then get a falsey reason [?!?!] and as such we will use >> self._reason instead >> Assumption 4 is nonsense and has never been true within Twisted as far as I know; connectionLost always gets a Failure, Failures are never falsey, so we will never use self._reason. To fix this we need to actually honor self._reason under the conditions we expect, i.e. it's a ConnectionAborted https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f00854128957/src/twisted/internet/error.py#L209 > > Thanks very much for your thoughts on this, Glyph - it's always helpful to have an insight into the intended design when trying to resolve this sort of thing. > > I don't follow your reasoning though. I think you might have misread the line you point to (https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f00854128957/src/twisted/protocols/tls.py#L391). It is "self._reason or reason" - ie, if self._reason is already set, it will take precedence over reason. Sigh. You're right, I read it backwards. > In my tests at least, TLSMemoryBIOProtocol.connectionLost is doing exactly the right thing - it is called with an unhelpful reason, but substitutes back in the helpful reason which has already been stashed. > > Rather, the problem, as I see it, is that it's not TLSMemoryBIOProtocol.connectionLost that calls Factory.clientConnectionLost. That is done by tcp.Client.connectionLost, via one of tcp.Client's myriad of base classes, at https://github.com/twisted/twisted/blob/3c868ac11786eef7ea269caa3056f00854128957/src/twisted/internet/tcp.py#L508. Of course, that doesn't get the benefit of TLSMemoryBIOProtocol's reason switcheroo. > > I'm still not quite sure who is in the wrong here. Aah, yeah, this is a weird quirk of the ancient-style layering in the SMTP code :-|. The way this should work is by using HostnameEndpoint. I'm not sure exactly where we're going off the rails, but by using both the old 'startTLS' style of starting a TLS connection, as well as relying on ClientFactory rather than an Endpoint of some kind, means that we're getting this duplicate notification; the one that you get to Protocol.connectionLost will come from TLS and have useful information, but the one that goes to the Connector will be coming straight from TCP. The right thing to fix here, I think, is to ignore clientConnectionLost entirely, and instead to have the protocol object relay its failure to some other differently-named method on SMTPSenderFactory. -g > > Cheers > > Richard > > > PS: yes, once we figure out what's going wrong here, I'll at least write up an issue... > > _______________________________________________ > Twisted-Python mailing list > Twisted-Python@twistedmatrix.com > https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
On Apr 29, 2021, at 3:09 AM, Richard van der Hoff <richard@matrix.org> wrote: Right! That sounds plausible, and certainly gives me some places to poke. I'll have another look later. Thanks very much!
Glad to help. Looking forward to the resolution on this! For what it's worth: it may also be useful to explicitly account for the Connector object in the transport.startTLS subsystem, it would be good to have the same reason object delivered in both places. So fixing the SMTP code isn't necessarily the only path forward here. -g
participants (2)
-
Glyph
-
Richard van der Hoff