[Twisted-Python] Need some enlightenment on using web client properly, or maybe nudge a bug to get fixed
I have written a simple service which takes data from network, massages it until it's useful enough, and sends the results out periodically via HTTP to an API. It all works for a while, then I get an error like this approximately 40 minutes into the service's uptime: ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.ZeroReturnError: >] Then a couple more like this: ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>] Then it ends with TimeoutError: User timeout caused connection failure. Then every request results in the same TimeoutError. I don't know if using HTTPS important in this case. Restarting the whole service, of course, makes the problem go for a while. The other side is the Slack API, so I rather assume it's not very much to blame, it can be demonstrated to work rather reliably, all its criticisms notwithstanding. I cannot yet tell if this bug is a function of uptime, or the number of requests made. I have tried to work around the problem by discarding the agent object, and using an HTTPConnectionPool with persistent=False, but it didn't help at all. I think it made the problem worse because the framework seems to refer to some objects the Agent creates, and the process becomes a CPU hogs in a couple hours (with the TimeoutErrors still happening all the time). The closest I've got on the internets which describes a similar problem, apart from people complaining on StackOverflow about precisely this to happen when they are using Scrapy, is this blog post from almost a decade ago: http://www.chris-wong.net/twisted-web-framework-user-timeout-caused-connecti... . There could be a small chance I'm holding it wrong(tm), but maybe there exists a ticket, just worded differently, which could help me get to the bottom of it. -- Yaroslav Fedevych IT Philosopher
Hi Jarosław!
On Jul 1, 2019, at 4:48 PM, Jarosław Fedewicz <jaroslaw.fedewicz@gmail.com> wrote:
I have written a simple service which takes data from network, massages it until it's useful enough, and sends the results out periodically via HTTP to an API.
A reasonable start :-).
It all works for a while, then I get an error like this approximately 40 minutes into the service's uptime:
ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.ZeroReturnError: >]
Then a couple more like this:
ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
Then it ends with
TimeoutError: User timeout caused connection failure.
Then every request results in the same TimeoutError. I don't know if using HTTPS important in this case.
I'm pretty sure the presence of an OpenSSL.SSL error indeed means that HTTPS is important.
Restarting the whole service, of course, makes the problem go for a while. The other side is the Slack API, so I rather assume it's not very much to blame, it can be demonstrated to work rather reliably, all its criticisms notwithstanding.
It does seem likely that the clustering of errors you're seeing are a local problem with Twisted.
I cannot yet tell if this bug is a function of uptime, or the number of requests made.
My personal guess is that it has something to do with the number of the TCP connections; or, specifically, the number of pyOpenSSL 'Connection' objects.
I have tried to work around the problem by discarding the agent object, and using an HTTPConnectionPool with persistent=False, but it didn't help at all. I think it made the problem worse because the framework seems to refer to some objects the Agent creates, and the process becomes a CPU hogs in a couple hours (with the TimeoutErrors still happening all the time).
I have a slight suspicion that the thing that is leaking between connections here is the pyOpenSSL "Context" object. We recently implemented an optimization which shares the Context object among multiple Connection objects that reference the same host. What version of Twisted area you using, and what version of OpenSSL, pyOpenSSL, and Cryptography? I'm curious if you reverse that optimization, if it would make any different to your use-case.
The closest I've got on the internets which describes a similar problem, apart from people complaining on StackOverflow about precisely this to happen when they are using Scrapy, is this blog post from almost a decade ago: http://www.chris-wong.net/twisted-web-framework-user-timeout-caused-connecti... <http://www.chris-wong.net/twisted-web-framework-user-timeout-caused-connection-failure/>.
This definitely seems like a bug, if it's occurring in multiple places.
There could be a small chance I'm holding it wrong(tm), but maybe there exists a ticket, just worded differently, which could help me get to the bottom of it.
I don't think that any open tickets describe your precise issue. So please do open one. And if possible, can you minimize a proof of concept? Some example code would go a long way to helping to isolate this. -glyph
So far, I tried to minimize a test case, but it seems like it's really picky about what environment it's running in. One of those cases where "it works on my machine", I suppose. The versions are as follows: cryptography==2.7 pyOpenSSL==19.0.0 asn1crypto==0.24.0 pyasn1==0.4.5 pyasn1-modules==0.2.5 Twisted==19.2.1 The target machine is running Xenial, so openssl 1.0.0g. My local machine runs Fedora 30, thus openssl 1.1.1c. Is there a neat way to list all pyOpenSSL objects in a running Twisted program? Or maybe TCPConnection objects, since those might hook to the zope.interface machinery? On Thu, Jul 11, 2019 at 9:20 AM Glyph <glyph@twistedmatrix.com> wrote:
Hi Jarosław!
On Jul 1, 2019, at 4:48 PM, Jarosław Fedewicz <jaroslaw.fedewicz@gmail.com> wrote:
I have written a simple service which takes data from network, massages it until it's useful enough, and sends the results out periodically via HTTP to an API.
A reasonable start :-).
It all works for a while, then I get an error like this approximately 40 minutes into the service's uptime:
ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.ZeroReturnError: >]
Then a couple more like this:
ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
Then it ends with
TimeoutError: User timeout caused connection failure.
Then every request results in the same TimeoutError. I don't know if using HTTPS important in this case.
I'm pretty sure the presence of an OpenSSL.SSL error indeed means that HTTPS is important.
Restarting the whole service, of course, makes the problem go for a while. The other side is the Slack API, so I rather assume it's not very much to blame, it can be demonstrated to work rather reliably, all its criticisms notwithstanding.
It does seem likely that the clustering of errors you're seeing are a local problem with Twisted.
I cannot yet tell if this bug is a function of uptime, or the number of requests made.
My personal guess is that it has something to do with the number of the TCP connections; or, specifically, the number of pyOpenSSL 'Connection' objects.
I have tried to work around the problem by discarding the agent object, and using an HTTPConnectionPool with persistent=False, but it didn't help at all. I think it made the problem worse because the framework seems to refer to some objects the Agent creates, and the process becomes a CPU hogs in a couple hours (with the TimeoutErrors still happening all the time).
I have a slight suspicion that the thing that is leaking between connections here is the pyOpenSSL "Context" object. We recently implemented an optimization which shares the Context object among multiple Connection objects that reference the same host. What version of Twisted area you using, and what version of OpenSSL, pyOpenSSL, and Cryptography?
I'm curious if you reverse that optimization, if it would make any different to your use-case.
The closest I've got on the internets which describes a similar problem, apart from people complaining on StackOverflow about precisely this to happen when they are using Scrapy, is this blog post from almost a decade ago: http://www.chris-wong.net/twisted-web-framework-user-timeout-caused-connecti... .
This definitely seems like a bug, if it's occurring in multiple places.
There could be a small chance I'm holding it wrong(tm), but maybe there exists a ticket, just worded differently, which could help me get to the bottom of it.
I don't think that any open tickets describe your precise issue. So please do open one. And if possible, can you minimize a proof of concept? Some example code would go a long way to helping to isolate this.
-glyph _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
-- Yaroslav Fedevych IT Philosopher
On Thursday, 11 July 2019 12:00:33 CEST Jarosław Fedewicz wrote:
Is there a neat way to list all pyOpenSSL objects in a running Twisted program? Or maybe TCPConnection objects, since those might hook to the zope.interface machinery?
Not specific to Twisted, but you can get a list of all objects tracked by the garbage collector using "gc.get_objects()" and then filter that by class. Bye, Maarten
On Thursday, 11 July 2019 11:00:33 BST Jarosław Fedewicz wrote:
So far, I tried to minimize a test case, but it seems like it's really picky about what environment it's running in. One of those cases where "it works on my machine", I suppose. The versions are as follows:
cryptography==2.7 pyOpenSSL==19.0.0 asn1crypto==0.24.0 pyasn1==0.4.5 pyasn1-modules==0.2.5 Twisted==19.2.1
The target machine is running Xenial, so openssl 1.0.0g.
That's old... Can you go to 1.0.2s? I recall that pyOpenSSL may need newer openssl - might be wrong on this.
My local machine runs Fedora 30, thus openssl 1.1.1c.
Is there a neat way to list all pyOpenSSL objects in a running Twisted program? Or maybe TCPConnection objects, since those might hook to the zope.interface machinery?
You can use the gc to help with this sort of debugging. gc.collect() for obj in gc.get_objects(): do something interesting with obj You could count the number of each type of obj and look for which ones increase over time. Barry
On Thu, Jul 11, 2019 at 9:20 AM Glyph <glyph@twistedmatrix.com> wrote:
Hi Jarosław!
On Jul 1, 2019, at 4:48 PM, Jarosław Fedewicz <jaroslaw.fedewicz@gmail.com> wrote:
I have written a simple service which takes data from network, massages it until it's useful enough, and sends the results out periodically via HTTP to an API.
A reasonable start :-).
It all works for a while, then I get an error like this approximately 40 minutes into the service's uptime:
ResponseNeverReceived: [<twisted.python.failure.Failure OpenSSL.SSL.ZeroReturnError: >]
Then a couple more like this:
ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
Then it ends with
TimeoutError: User timeout caused connection failure.
Then every request results in the same TimeoutError. I don't know if using HTTPS important in this case.
I'm pretty sure the presence of an OpenSSL.SSL error indeed means that HTTPS is important.
Restarting the whole service, of course, makes the problem go for a while. The other side is the Slack API, so I rather assume it's not very much to blame, it can be demonstrated to work rather reliably, all its criticisms notwithstanding.
It does seem likely that the clustering of errors you're seeing are a local problem with Twisted.
I cannot yet tell if this bug is a function of uptime, or the number of requests made.
My personal guess is that it has something to do with the number of the TCP connections; or, specifically, the number of pyOpenSSL 'Connection' objects.
I have tried to work around the problem by discarding the agent object, and using an HTTPConnectionPool with persistent=False, but it didn't help at all. I think it made the problem worse because the framework seems to refer to some objects the Agent creates, and the process becomes a CPU hogs in a couple hours (with the TimeoutErrors still happening all the time).
I have a slight suspicion that the thing that is leaking between connections here is the pyOpenSSL "Context" object. We recently implemented an optimization which shares the Context object among multiple Connection objects that reference the same host. What version of Twisted area you using, and what version of OpenSSL, pyOpenSSL, and Cryptography?
I'm curious if you reverse that optimization, if it would make any different to your use-case.
The closest I've got on the internets which describes a similar problem, apart from people complaining on StackOverflow about precisely this to happen when they are using Scrapy, is this blog post from almost a decade ago: http://www.chris-wong.net/twisted-web-framework-user-timeout-caused-connec tion-failure/ .
This definitely seems like a bug, if it's occurring in multiple places.
There could be a small chance I'm holding it wrong(tm), but maybe there exists a ticket, just worded differently, which could help me get to the bottom of it.
I don't think that any open tickets describe your precise issue. So please do open one. And if possible, can you minimize a proof of concept? Some example code would go a long way to helping to isolate this.
-glyph _______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com https://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
participants (4)
-
Glyph
-
Jarosław Fedewicz
-
Maarten ter Huurne
-
Scott, Barry