[Twisted-Python] MemoryError in twisted.internet.abstract.FileDescriptor
Hi! I'm writing a software using Python Twisted with the following architecture: [server]<->[providers] The providers (that can be thousands) send events to the centralized server, usually we have a few events per minute for each provider, but a burst of events can happen. So I write a test provider to simulate the sending of millions of events without any delay, via a thread like that: def stressTestTask(): time.sleep(1) c = provider_reference for i in range(1000000): event = c.createTestEvent() reactor.callFromThread(c.transport.write, event) Then I started to analyze memory consumption while running this test code, and I find out that the memory consumption grows to gigabytes and finally a MemoryError exception raises: Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/twisted/python/log.py", line 84, in callWithLogger return callWithContext({"system": lp}, func, *args, **kw) File "/usr/lib/python2.5/site-packages/twisted/python/log.py", line 69, in callWithContext return context.call({ILogContext: newCtx}, func, *args, **kw) File "/usr/lib/python2.5/site-packages/twisted/python/context.py", line 118, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/usr/lib/python2.5/site-packages/twisted/python/context.py", line 81, in callWithContext return func(*args,**kw) --- <exception caught here> --- File "/usr/lib/python2.5/site-packages/twisted/internet/selectreactor.py", line 146, in _doReadOrWrite why = getattr(selectable, method)() File "/usr/lib/python2.5/site-packages/twisted/internet/tcp.py", line 177, in doWrite return Connection.doWrite(self) File "/usr/lib/python2.5/site-packages/twisted/internet/tcp.py", line 428, in doWrite result = abstract.FileDescriptor.doWrite(self) File "/usr/lib/python2.5/site-packages/twisted/internet/abstract.py", line 199, in doWrite self.dataBuffer = buffer(self.dataBuffer, self.offset) + "".join(self._tempDataBuffer) exceptions.MemoryError: After a search, I found this ticket: http://twistedmatrix.com/trac/ticket/3682 And I think that this is exact my problem. What's the best solution? Apply the patch attached on this ticket, moving to a producer/consumer approach, or any other idea? Thanks! -- Augusto Mecking Caringi
What's the best solution? Apply the patch attached on this ticket, moving to a producer/consumer approach, or any other idea?
The patch will just delay the problem... you're creating a huge number of strings, faster than the transport can write them out. The solution is indeed to use the consumer API to pause creation of more data until the transport clears its buffers.
On Fri, Dec 9, 2011 at 7:31 PM, Itamar Turner-Trauring <itamar@itamarst.org> wrote:
What's the best solution? Apply the patch attached on this ticket, moving to a producer/consumer approach, or any other idea?
The patch will just delay the problem... you're creating a huge number of strings, faster than the transport can write them out. The solution is indeed to use the consumer API to pause creation of more data until the transport clears its buffers.
Right, I will try with the consumer API. But I have one last question: In my previous example, the memory usage grows until a MemoryError exception. But other scenario is when my "event send loop" iterates a high number of times (but not enough to raise a exception) and then stops. I expected that when the loop ends, all the strings would be flushed and as a consequence, the memory usage of the process would return to a normal level. But this does not happen... It's normal? Thanks! -- Augusto Mecking Caringi
On 10 Dec, 05:25 pm, augustocaringi@gmail.com wrote:
On Fri, Dec 9, 2011 at 7:31 PM, Itamar Turner-Trauring <itamar@itamarst.org> wrote:
� � What's the best solution? Apply the patch attached on this ticket, moving to a producer/consumer approach, or any other idea?
The patch will just delay the problem... you're creating a huge number of strings, faster than the transport can write them out. The solution is indeed to use the consumer API to pause creation of more data until the transport clears its buffers.
Right, I will try with the consumer API.
But I have one last question:
In my previous example, the memory usage grows until a MemoryError exception.
But other scenario is when my "event send loop" iterates a high number of times (but not enough to raise a exception) and then stops.
I expected that when the loop ends, all the strings would be flushed and as a consequence, the memory usage of the process would return to a normal level. But this does not happen... It's normal?
Data may or may not be put onto the network as you are directing a transport to write it. It's up to the particular transport implementation to decide on buffering logic, including logic about whether data is sent immediately when a write() call is, or only later after control returns to the event loop. As of Twisted 11.1, the posix-based reactor implementations all buffer data until control is returned to the event loop. This has been the case for some time, but not _all_ time, and it may change in the future. Jean-Paul
On Sun, Dec 11, 2011 at 5:11 PM, <exarkun@twistedmatrix.com> wrote:
On 10 Dec, 05:25 pm, augustocaringi@gmail.com wrote:
On Fri, Dec 9, 2011 at 7:31 PM, Itamar Turner-Trauring <itamar@itamarst.org> wrote:
What's the best solution? Apply the patch attached on this ticket, moving to a producer/consumer approach, or any other idea?
The patch will just delay the problem... you're creating a huge number of strings, faster than the transport can write them out. The solution is indeed to use the consumer API to pause creation of more data until the transport clears its buffers.
Right, I will try with the consumer API.
But I have one last question:
In my previous example, the memory usage grows until a MemoryError exception.
But other scenario is when my "event send loop" iterates a high number of times (but not enough to raise a exception) and then stops.
I expected that when the loop ends, all the strings would be flushed and as a consequence, the memory usage of the process would return to a normal level. But this does not happen... It's normal?
Data may or may not be put onto the network as you are directing a transport to write it. It's up to the particular transport implementation to decide on buffering logic, including logic about whether data is sent immediately when a write() call is, or only later after control returns to the event loop.
As of Twisted 11.1, the posix-based reactor implementations all buffer data until control is returned to the event loop. This has been the case for some time, but not _all_ time, and it may change in the future.
Jean-Paul
Hi! I understand... But I'm not convinced. I found this post on Stackoverflow: http://stackoverflow.com/questions/4078084/memory-leak-in-python-twisted-whe... The problem that I am facing is the same. Under heavy load my Twisted based server "leaks memory". Under normal load its all ok. Explaining better: Under heavy load the memory usage grows, but when the load returns to a normal level, the memory usage remains high. I forgot to say in the previous emails, but the communication is done over SSL. I'm not a Python specialist, I am a C engineer... But I'm trying to find out where is the problem with tools like Heapy, meliae, etc. Meliae gave me this report: Total 62008 objects, 126 types, Total size = 86.2MiB (90437121 bytes) Index Count % Size % Cum Max Kind 0 39061 62 87827505 97 9753120905 str 1 1682 2 896912 0 98 393352 dict 2 10067 16 382028 0 98 224 tuple 3 164 0 246656 0 98 6304 module 4 422 0 183992 0 99 436 type 5 2504 4 170272 0 99 68 code As expected... The problem are the strings that are never freed. So I tried to tracking this strings, and this let me to this:
om[3082188716] TLSConnection(3082188716 548B 35refs 38par)
om[3082188716].c [str(3082713312 39B 8par '_tempDataBuffer'), list(3082225612 112704B 26680refs 1par), str(3086791008 30B 54par 'fileno'), instancemethod(3084243748 36B 3refs 1par), str(3084439680 32B 58par 'protocol'), Provider(3082216172 552B 23refs 4par), str(3084439648 31B 100par 'reactor'), SelectReactor(3082464620 548B 43refs 21par), str(3082712592 37B 6par '_userWantRead'), bool(40623276 12B 94par 'True'), str(3082713272 34B 8par 'dataBuffer'), str(2917687304 53120905B 1par 'ype: Test\nDate: 2011-12-14 16:45:30.995217\nEvent-Subtype: FooBar\nContent-Type: application/json\n\n{\n '), str(3082831376 36B 6par '_tempDataLen'), int(165862528 12B 1par 33109881), str(3082714752 35B 6par 'realAddress'), tuple(3082165324 32B 2refs 1par), str(3084479288 33B 29par 'connector'), Connector(3082188684 552B 21refs 1par), str(3082724352 30B 9par 'logstr'), str(3082181232 39B 1par 'Provider,client'), str(3082723680 27B 5par 'TLS'), int(164303008 12B 100par 1), str(3084439744 30B 80par 'socket'), Connection(3082208356 40B 3refs 1par), str(3084506080 33B 33par 'connected'), int(164303008 12B 100par 1), str(3087080928 30B 10par 'offset'), int(167915152 12B 1par 11436032), str(3082712632 38B 6par '_userWantWrite'), bool(40623276 12B 94par 'True'), str(3082685960 34B 8par 'ctxFactory'), ClientContextFactory(3082188652 168B 1refs 2par), str(3086955200 28B 38par 'addr'), tuple(3082649516 32B 2refs 1par), <ex-reference>(0 0B)]
Thanks and sorry for the insistence. -- Augusto Mecking Caringi
Augusto Mecking Caringi wrote:
om[3082188716].c […] str(2917687304 53120905B 1par 'ype: Test\nDate: 2011-12-14 16:45:30.995217\nEvent-Subtype: FooBar\nContent-Type: application/json\n\n{\n ')
So it appears you have 53 megabyte string that's referenced from your TLS connection, presumably in a buffer of data waiting to be sent? At the moment this memory analysis was captured, do you know how much of this data the client has received? Also, how was this data passed to the transport — in one big 53MB write call, or via many smaller calls, or via a pull producer, or…? This looks like an HTTP-like protocol, so possibly there's a library between your code and the transport that means you don't know the direct answer to that question. But whatever details you can give about how you are passing the data you generate to the connection will help locate the problem, I think. -Andrew.
On Wed, Dec 14, 2011 at 9:49 PM, Andrew Bennetts <andrew@bemusement.org> wrote:
Augusto Mecking Caringi wrote:
om[3082188716].c […] str(2917687304 53120905B 1par 'ype: Test\nDate: 2011-12-14 16:45:30.995217\nEvent-Subtype: FooBar\nContent-Type: application/json\n\n{\n ')
So it appears you have 53 megabyte string that's referenced from your TLS connection, presumably in a buffer of data waiting to be sent? At the moment this memory analysis was captured, do you know how much of this data the client has received?
Yes... I have a 53MB string referenced in my TLS connection, but *all* the data has already been sent. At the moment that this memory was captured all the data has been sent.
Also, how was this data passed to the transport — in one big 53MB write call, or via many smaller calls, or via a pull producer, or…?
Via many smaller calls. One call for each event, in a code like that (running in a thread): def stressTestTask(): time.sleep(1) c = provider_reference for i in range(1000000): event = c.createTestEvent() reactor.callFromThread(c.transport.write, event) I have two components in my system: the server and the provider, both of them written in Python/Twisted. This capture was from a "stress test provider" that send hundreds of thousands of events to the server without any "sleep". And I captured this memory analysis after the server has received all the data.
This looks like an HTTP-like protocol, so possibly there's a library between your code and the transport that means you don't know the direct answer to that question. But whatever details you can give about how you are passing the data you generate to the connection will help locate the problem, I think.
Yes, this a HTTP-like protocol, but there is no library, the protocol was designed by me. Thanks. -- Augusto Mecking Caringi
On Dec 14, 2011, at 7:38 PM, Augusto Mecking Caringi wrote:
On Wed, Dec 14, 2011 at 9:49 PM, Andrew Bennetts <andrew@bemusement.org> wrote:
Augusto Mecking Caringi wrote:
om[3082188716].c […] str(2917687304 53120905B 1par 'ype: Test\nDate: 2011-12-14 16:45:30.995217\nEvent-Subtype: FooBar\nContent-Type: application/json\n\n{\n ')
So it appears you have 53 megabyte string that's referenced from your TLS connection, presumably in a buffer of data waiting to be sent? At the moment this memory analysis was captured, do you know how much of this data the client has received?
Yes... I have a 53MB string referenced in my TLS connection, but *all* the data has already been sent.
At the moment that this memory was captured all the data has been sent.
Also, how was this data passed to the transport — in one big 53MB write call, or via many smaller calls, or via a pull producer, or…?
Via many smaller calls. One call for each event, in a code like that (running in a thread):
def stressTestTask(): time.sleep(1) c = provider_reference for i in range(1000000): event = c.createTestEvent() reactor.callFromThread(c.transport.write, event)
I have two components in my system: the server and the provider, both of them written in Python/Twisted. This capture was from a "stress test provider" that send hundreds of thousands of events to the server without any "sleep". And I captured this memory analysis after the server has received all the data.
This looks like an HTTP-like protocol, so possibly there's a library between your code and the transport that means you don't know the direct answer to that question. But whatever details you can give about how you are passing the data you generate to the connection will help locate the problem, I think.
Yes, this a HTTP-like protocol, but there is no library, the protocol was designed by me.
Thanks.
If everything you've said here is accurate, this is definitely a bug in Twisted. It would be really valuable if you could boil down your example to be as minimal as possible - ideally a unit test, but failing that, a script which could be trivially run to leak an arbitrary amount of memory would be almost as good. Does your callFromThread(c.transport.write, ...) code reliably trigger this error when talking to a very simple protocol, like an Echo or Discard protocol? Thanks a lot for helping us track this down, -glyph
On 08:34 pm, augustocaringi@gmail.com wrote:
On Sun, Dec 11, 2011 at 5:11 PM, <exarkun@twistedmatrix.com> wrote:
On 10 Dec, 05:25 pm, augustocaringi@gmail.com wrote:
On Fri, Dec 9, 2011 at 7:31 PM, Itamar Turner-Trauring <itamar@itamarst.org> wrote:
� � What's the best solution? Apply the patch attached on this ticket, moving to a producer/consumer approach, or any other idea?
The patch will just delay the problem... you're creating a huge number of strings, faster than the transport can write them out. The solution is indeed to use the consumer API to pause creation of more data until the transport clears its buffers.
Right, I will try with the consumer API.
But I have one last question:
In my previous example, the memory usage grows until a MemoryError exception.
But other scenario is when my "event send loop" iterates a high number of times (but not enough to raise a exception) and then stops.
I expected that when the loop ends, all the strings would be flushed and as a consequence, the memory usage of the process would return to a normal level. But this does not happen... It's normal?
Data may or may not be put onto the network as you are directing a transport to write it. �It's up to the particular transport implementation to decide on buffering logic, including logic about whether data is sent immediately when a write() call is, or only later after control returns to the event loop.
As of Twisted 11.1, the posix-based reactor implementations all buffer data until control is returned to the event loop. �This has been the case for some time, but not _all_ time, and it may change in the future.
Jean-Paul
Hi!
I understand...
But I'm not convinced.
I found this post on Stackoverflow: http://stackoverflow.com/questions/4078084/memory-leak-in-python- twisted-where-is-it
The problem that I am facing is the same.
Under heavy load my Twisted based server "leaks memory". Under normal load its all ok.
Explaining better: Under heavy load the memory usage grows, but when the load returns to a normal level, the memory usage remains high.
This is basically typical. I won't guarantee that your program isn't *leaking* memory, but it sounds like you're just observing the "high water mark" behavior of the CPython runtime (and to a lesser extend of glibc). Allocate memory is frequently not released to the platform when it is no longer in use, because fragmentation has happened that makes this impractical. Can you clarify if this is what you're seeing, or if you do see memory usage continue to rise without bound (under a bounded load)? Jean-Paul
I forgot to say in the previous emails, but the communication is done over SSL.
I'm not a Python specialist, I am a C engineer... But I'm trying to find out where is the problem with tools like Heapy, meliae, etc.
Meliae gave me this report:
Total 62008 objects, 126 types, Total size = 86.2MiB (90437121 bytes) Index Count % Size % Cum Max Kind 0 39061 62 87827505 97 9753120905 str 1 1682 2 896912 0 98 393352 dict 2 10067 16 382028 0 98 224 tuple 3 164 0 246656 0 98 6304 module 4 422 0 183992 0 99 436 type 5 2504 4 170272 0 99 68 code
As expected... The problem are the strings that are never freed.
So I tried to tracking this strings, and this let me to this:
om[3082188716] TLSConnection(3082188716 548B 35refs 38par) om[3082188716].c [str(3082713312 39B 8par '_tempDataBuffer'), list(3082225612 112704B 26680refs 1par), str(3086791008 30B 54par 'fileno'), instancemethod(3084243748 36B 3refs 1par), str(3084439680 32B 58par 'protocol'), Provider(3082216172 552B 23refs 4par), str(3084439648 31B 100par 'reactor'), SelectReactor(3082464620 548B 43refs 21par), str(3082712592 37B 6par '_userWantRead'), bool(40623276 12B 94par 'True'), str(3082713272 34B 8par 'dataBuffer'), str(2917687304 53120905B 1par 'ype: Test\nDate: 2011-12-14 16:45:30.995217\nEvent-Subtype: FooBar\nContent-Type: application/json\n\n{\n '), str(3082831376 36B 6par '_tempDataLen'), int(165862528 12B 1par 33109881), str(3082714752 35B 6par 'realAddress'), tuple(3082165324 32B 2refs 1par), str(3084479288 33B 29par 'connector'), Connector(3082188684 552B 21refs 1par), str(3082724352 30B 9par 'logstr'), str(3082181232 39B 1par 'Provider,client'), str(3082723680 27B 5par 'TLS'), int(164303008 12B 100par 1), str(3084439744 30B 80par 'socket'), Connection(3082208356 40B 3refs 1par), str(3084506080 33B 33par 'connected'), int(164303008 12B 100par 1), str(3087080928 30B 10par 'offset'), int(167915152 12B 1par 11436032), str(3082712632 38B 6par '_userWantWrite'), bool(40623276 12B 94par 'True'), str(3082685960 34B 8par 'ctxFactory'), ClientContextFactory(3082188652 168B 1refs 2par), str(3086955200 28B 38par 'addr'), tuple(3082649516 32B 2refs 1par), <ex-reference>(0 0B)]
Thanks and sorry for the insistence.
-- Augusto Mecking Caringi
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
participants (5)
-
Andrew Bennetts
-
Augusto Mecking Caringi
-
exarkun@twistedmatrix.com
-
Glyph
-
Itamar Turner-Trauring