[Twisted-Python] Memory size of Deferreds

Hello, I am working on a project called VIFF (http://viff.dk/) which does secure multiparty computations. Twisted, and in particular the Deferred class, has been a great tool for this! Each variable in a computation is stored as a Deferred, and the computation builds an expression tree based on these Deferreds, which is then evaluated as the Deferreds call their callbacks. What worries me is the size of a single Deferred. One user wanted to do computations on very large lists of Deferreds and reported that his programs used more memory than expected. I tried to measure the size of a Deferred by creating large lists of them and found that a single empty Deferred takes up about 200 bytes. Adding a callback brings that up to about 500 bytes. The discussion can be found here: http://thread.gmane.org/gmane.comp.cryptography.viff.devel/232 I might be measuring this the wrong way -- I simply looked at how the memory size grew in 'top' when I created more and more lists. As noted in the thread above, it helps defining the __slots__ attribute -- it saves the 144 bytes needed for the default object dictionary. Maybe that could be done for Deferred? Has anybody measured how much memory the C implementation users per Deferred? I have not tried it out yet. http://twistedmatrix.com/trac/ticket/2245 -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.

2008/5/19 Martin Geisler <mg@daimi.au.dk>:
As such thing as memory allocation is tightly tied to the runtime environment, it is a bit hard to give reasonable answers without any information about, at least, an OS and a CPU. Did the memory in the case you mention grow constantly? Is that user sure, that it was not a memory leak in his/hers code? Deferreds, as you all already know, are a core concept that Twisted is built upon. If you need to optimize such areas of Twisted, then, just my suggestion - maybe you are using a wrong tool to do the job. If the memory usage is a problem, then, well, maybe you are trying to do things at once, when you should do them sequentially (eg. using a generator). Maybe you should rather look into other areas of the problem (eg. not try to optimize deferreds, but the code that uses them). Maybe you should use something else than Deferreds for that job. As for memory usage of such small objects like Deferreds, well... Python is, well, Python. Python string or integer uses much more memory, than C string or int. You can't tune Python string memory usage, and you hardly can do anything about size of Deferred if that becomes a problem. On the other hand, you should be able to write optimal code with Python faster and memory footprint shouldn't be a problem (from my experience, in many cases Twisted footprint is suprisingly low). In other words - if size of Deferred object is your only bottleneck, then congratulations :-) I'd rather suggest optimizing other parts of the code - as Deferreds are the core of Twisted, I suppose it could be a bit hard to do anything about them, in the same way you can't do anything about memory footprint of core Python objects. Regards, -- m

"Michal Pasternak" <michal.dtz@gmail.com> writes: Thanks for the reply!
Ah, sorry: I tested this on an Athlon 64 X2 (but with 32-bit Debian). I simply tested like this: % python Python 2.5.2 (r252:60911, Apr 17 2008, 13:15:05) [GCC 4.2.3 (Debian 4.2.3-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
and noted how the memory usage rose from 8 MiB to 30 MiB, to 52 MiB to 73 MiB in 'top'. So that is about 22 MiB per list -- around ~200 bytes pr Deferred. (I know there is some overhead for the list.)
Did the memory in the case you mention grow constantly? Is that user sure, that it was not a memory leak in his/hers code?
I think he was just concerned that his application used 450 MiB RAM, and that made me try to see how big an empty Deferred is.
Yes, you may be right... scheduling everything to run at once is probably not necessary. It just makes things simple with my design. I would love to get some advice on the design of my library -- it is the first thing I have made with Twisted. The goal of VIFF is to enable shared computations -- think addition and multiplication. So the user writes x = add(a, mul(b * c)) and where a, b, c are Deferreds which will eventually hold integers. The add and mul functions return new Deferreds which will eventually hold the results. Addition is simple: def add(x, y): sum = gatherResults([x, y]) sum.addCallback(lambda results: results[0] + results[1]) return sum Multiplication is similar, but requires network communication -- that is why it returns a Deferred. This is the basic operation of the library: an expression tree is built and evaluated from the leaves inwards. The important point is that as much as possible is done in parallel, that all operations run as soon as their operands are available. Twisted made this extremely easy thanks to the DeferredList and Deferred abstraction.
There is the possibility of using the __slots__ attribute to avoid the object dictionary. I don't know what negative sideeffects that might have, though. Also, I have yet to try the C implementation of Deferred to see how it performs memory-wise.
Hehe, thanks! :-) I am certainly not claiming that Deferreds are the only bottleneck or consumer of memory. I just figured that they would be an easy starting point since my code (and as you say, Twisted in general) uses them a lot.
Right... I'll try looking through the code to see if I by mistake hold on to Deferreds or other objects longer than I have to. -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.

Martin Geisler wrote:
Surely (he said, tongue firmly in cheek) the answer to having too many outstanding Deferreds is to use faster peer systems :-) regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/

It is not that difficult to see what is inside the deferred (spawn some debugger and see): (this may be a class variable) debug (boolean) (those surely are instance variables) callbacks (a list, initially empty, later containing tuples of tuples) called (int) paused (int) result (anything) timeoutCall (None or object) _runningCallbacks (boolean) _debugInfo (object of type DebugInfo) Bare names of attributes take about 60 bytes (this is what could be reclamied with __slots__), some of them are references, we have a list (which inside contains complicated structures), .... It could be an interesting experiment to define __slots__ for Deferred class and see what is the memory impact. Hmm, hmm. I did an experiment. Here is one version of the testing code: Cls = Deferred count = 250000 import os pid = os.getpid() cmd = "ps -F %d" % pid a = [Cls() for _ in xrange(count)] os.system(cmd) b = [Cls() for _ in xrange(count)] os.system(cmd) c = [Cls() for _ in xrange(count)] os.system(cmd) and here is one using __slots__: from twisted.internet.defer import timeout, Deferred class SlotsDeferred(object): __slots__ = ['debug', 'callbacks', 'called', 'paused', 'result', 'timeoutCall', '_runningCallbacks', '_debugInfo'] # whole content of original Deferred class copied as-is here Cls = SlotsDeferred count = 250000 import os pid = os.getpid() cmd = "ps -F %d" % pid a = [Cls() for _ in xrange(count)] os.system(cmd) b = [Cls() for _ in xrange(count)] os.system(cmd) c = [Cls() for _ in xrange(count)] os.system(cmd) And here are the results: (standard Deferred) $ python testdef.py UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD marcink 23109 20536 92 15180 56992 1 15:40 pts/18 S+ 0:00 python testdef.py UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD marcink 23109 20536 99 28374 109000 1 15:40 pts/18 S+ 0:02 python testdef.py UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD marcink 23109 20536 99 41568 161020 1 15:40 pts/18 S+ 0:05 python testdef.py (Deferred with __slots__ added) $ python testdef.py UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD marcink 23127 20536 0 7834 28068 1 15:41 pts/18 S+ 0:00 python testdef.py UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD marcink 23127 20536 99 13683 51160 0 15:41 pts/18 S+ 0:01 python testdef.py UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD marcink 23127 20536 99 19532 74256 1 15:41 pts/18 S+ 0:02 python testdef.py As one can see, the memory usage is ~ halved. So ... maybe it would make sense to add __slots__ to deferreds, after all? -- ---------------------------------------------------------------------- | Marcin Kasperski | In any large change, 1/3 will think it is | http://mekk.waw.pl | great, 1/3 will think it is stupid, and 1/3 | | will wait (Reed) ----------------------------------------------------------------------

Marcin Kasperski <Marcin.Kasperski@softax.com.pl> writes:
Yeah, I have already looked at the code for the Deferred class -- I like it, it is delightfully small! :-)
Cool -- that is a substantial saving! I did an experiment where I tested these classes:
and found that X objects take up about 40 bytes in memory whereas Y objects take up 180 bytes in memory. This mail: http://mail.python.org/pipermail/python-list/2002-March/135223.html explains that a dictionary takes up 144 bytes in memory and that fits very nicely with the savings I saw with the X and Y classes above. -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.

For the sake of completness: I saved this as http://twistedmatrix.com/trac/ticket/3245

On 2008-05-19, 22:45:26 Martin Geisler <mg@daimi.au.dk> wrote:
"Michal Pasternak" <michal.dtz@gmail.com> writes:
[...]
I don't think that there's anything wrong with this example in terms of being written using Twisted Python.
This is the point where I think, that you answered your question. If you do stuff in parallel, you use more Deferreds and more memory. I will repeat myself now: I would rather try to find other places to optimize VIFF, than reducing memory footprint of Deferred. I don't have a CS degree, I'm not a mathematician, but from what you wrote it seems that maybe it would be worth to try implementing something stack-based (like RPN), than the tree approach you seem to be doing now - of course if having a stack is something you can have using secure multiparty calculations. -- m

Michał Pasternak <michal.dtz@gmail.com> writes:
Okay, thanks for looking at it!
Yes, I guess you are right... :-) The design tries to do everything at once with as much parallism as possible. But since bandwidth is limited, I guess I could achieve the same throughput by scheduling things in a slower rate. Maybe using something like this: class map_parallel(Deferred): def __init__(self, iterable, func, max): Deferred.__init__(self) self.iter = iterable self.func = func self.running = 0 for _ in range(max): self.running += 1 self.run_next(None) def run_next(self, arg): try: e = self.iter.next() e.addCallback(self.func) e.addCallback(self.run_next) except StopIteration: # The iterator is exhausted. self.running -= 1 if self.running == 0 and not self.called: # All callbacks have finished. self.callback(None) # Pass the argument through unchanged. return arg where map_parallel will map a function over a list of Deferreds, but it will only run up to max function applications in parallel. I will experiment with it, and if it works well, then it would be cool to have a class like this in Twisted.
Thanks for the ideas! You could certainly do what you suggest, changing the schedule like that is okay from a security point of view. And if I can delay allocating memory until needed by doing so, then it will definitely be interesting to try. -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.

Martin Geisler wrote:
Something like DeferredQueue? :-) http://twistedmatrix.com/trac/browser/trunk/twisted/internet/defer.py#L950 -- Nicola Larosa - http://www.tekNico.net/ My dream is that in 500 years (or less) humanity will look back at what passes for psychology today in the same way that we now look back on the dark ages, when they were drilling holes in people's heads to drain out the crazy. Oh wait, we were still doing that in the twentieth century (http://en.wikipedia.org/wiki/Lobotomy). - Phillip J. Eby, February 2008

Nicola Larosa <nico@teknico.net> writes:
Hehe, I can see that this does something that looks very similar :-) I think a DeferredQueue is a bit more general than my code -- I think you need some code in addition to a DeferredQueue in order to obtain what I want: limited parallelism. In particular, I guess you need to catch the QueueUnderflow exceptions and then somehow know when it is time to call get() again. -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.

* Martin Geisler <mg@daimi.au.dk> [2008-05-21 13:03:30 +0200]:
In particular, I guess you need to catch the QueueUnderflow exceptions and then somehow know when it is time to call get() again.
DeferredSemaphore may be more appropriate. -- mithrandi, i Ainil en-Balandor, a faer Ambar

2008/5/19 Martin Geisler <mg@daimi.au.dk>:
As such thing as memory allocation is tightly tied to the runtime environment, it is a bit hard to give reasonable answers without any information about, at least, an OS and a CPU. Did the memory in the case you mention grow constantly? Is that user sure, that it was not a memory leak in his/hers code? Deferreds, as you all already know, are a core concept that Twisted is built upon. If you need to optimize such areas of Twisted, then, just my suggestion - maybe you are using a wrong tool to do the job. If the memory usage is a problem, then, well, maybe you are trying to do things at once, when you should do them sequentially (eg. using a generator). Maybe you should rather look into other areas of the problem (eg. not try to optimize deferreds, but the code that uses them). Maybe you should use something else than Deferreds for that job. As for memory usage of such small objects like Deferreds, well... Python is, well, Python. Python string or integer uses much more memory, than C string or int. You can't tune Python string memory usage, and you hardly can do anything about size of Deferred if that becomes a problem. On the other hand, you should be able to write optimal code with Python faster and memory footprint shouldn't be a problem (from my experience, in many cases Twisted footprint is suprisingly low). In other words - if size of Deferred object is your only bottleneck, then congratulations :-) I'd rather suggest optimizing other parts of the code - as Deferreds are the core of Twisted, I suppose it could be a bit hard to do anything about them, in the same way you can't do anything about memory footprint of core Python objects. Regards, -- m

"Michal Pasternak" <michal.dtz@gmail.com> writes: Thanks for the reply!
Ah, sorry: I tested this on an Athlon 64 X2 (but with 32-bit Debian). I simply tested like this: % python Python 2.5.2 (r252:60911, Apr 17 2008, 13:15:05) [GCC 4.2.3 (Debian 4.2.3-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
and noted how the memory usage rose from 8 MiB to 30 MiB, to 52 MiB to 73 MiB in 'top'. So that is about 22 MiB per list -- around ~200 bytes pr Deferred. (I know there is some overhead for the list.)
Did the memory in the case you mention grow constantly? Is that user sure, that it was not a memory leak in his/hers code?
I think he was just concerned that his application used 450 MiB RAM, and that made me try to see how big an empty Deferred is.
Yes, you may be right... scheduling everything to run at once is probably not necessary. It just makes things simple with my design. I would love to get some advice on the design of my library -- it is the first thing I have made with Twisted. The goal of VIFF is to enable shared computations -- think addition and multiplication. So the user writes x = add(a, mul(b * c)) and where a, b, c are Deferreds which will eventually hold integers. The add and mul functions return new Deferreds which will eventually hold the results. Addition is simple: def add(x, y): sum = gatherResults([x, y]) sum.addCallback(lambda results: results[0] + results[1]) return sum Multiplication is similar, but requires network communication -- that is why it returns a Deferred. This is the basic operation of the library: an expression tree is built and evaluated from the leaves inwards. The important point is that as much as possible is done in parallel, that all operations run as soon as their operands are available. Twisted made this extremely easy thanks to the DeferredList and Deferred abstraction.
There is the possibility of using the __slots__ attribute to avoid the object dictionary. I don't know what negative sideeffects that might have, though. Also, I have yet to try the C implementation of Deferred to see how it performs memory-wise.
Hehe, thanks! :-) I am certainly not claiming that Deferreds are the only bottleneck or consumer of memory. I just figured that they would be an easy starting point since my code (and as you say, Twisted in general) uses them a lot.
Right... I'll try looking through the code to see if I by mistake hold on to Deferreds or other objects longer than I have to. -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.

Martin Geisler wrote:
Surely (he said, tongue firmly in cheek) the answer to having too many outstanding Deferreds is to use faster peer systems :-) regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/

It is not that difficult to see what is inside the deferred (spawn some debugger and see): (this may be a class variable) debug (boolean) (those surely are instance variables) callbacks (a list, initially empty, later containing tuples of tuples) called (int) paused (int) result (anything) timeoutCall (None or object) _runningCallbacks (boolean) _debugInfo (object of type DebugInfo) Bare names of attributes take about 60 bytes (this is what could be reclamied with __slots__), some of them are references, we have a list (which inside contains complicated structures), .... It could be an interesting experiment to define __slots__ for Deferred class and see what is the memory impact. Hmm, hmm. I did an experiment. Here is one version of the testing code: Cls = Deferred count = 250000 import os pid = os.getpid() cmd = "ps -F %d" % pid a = [Cls() for _ in xrange(count)] os.system(cmd) b = [Cls() for _ in xrange(count)] os.system(cmd) c = [Cls() for _ in xrange(count)] os.system(cmd) and here is one using __slots__: from twisted.internet.defer import timeout, Deferred class SlotsDeferred(object): __slots__ = ['debug', 'callbacks', 'called', 'paused', 'result', 'timeoutCall', '_runningCallbacks', '_debugInfo'] # whole content of original Deferred class copied as-is here Cls = SlotsDeferred count = 250000 import os pid = os.getpid() cmd = "ps -F %d" % pid a = [Cls() for _ in xrange(count)] os.system(cmd) b = [Cls() for _ in xrange(count)] os.system(cmd) c = [Cls() for _ in xrange(count)] os.system(cmd) And here are the results: (standard Deferred) $ python testdef.py UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD marcink 23109 20536 92 15180 56992 1 15:40 pts/18 S+ 0:00 python testdef.py UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD marcink 23109 20536 99 28374 109000 1 15:40 pts/18 S+ 0:02 python testdef.py UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD marcink 23109 20536 99 41568 161020 1 15:40 pts/18 S+ 0:05 python testdef.py (Deferred with __slots__ added) $ python testdef.py UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD marcink 23127 20536 0 7834 28068 1 15:41 pts/18 S+ 0:00 python testdef.py UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD marcink 23127 20536 99 13683 51160 0 15:41 pts/18 S+ 0:01 python testdef.py UID PID PPID C SZ RSS PSR STIME TTY STAT TIME CMD marcink 23127 20536 99 19532 74256 1 15:41 pts/18 S+ 0:02 python testdef.py As one can see, the memory usage is ~ halved. So ... maybe it would make sense to add __slots__ to deferreds, after all? -- ---------------------------------------------------------------------- | Marcin Kasperski | In any large change, 1/3 will think it is | http://mekk.waw.pl | great, 1/3 will think it is stupid, and 1/3 | | will wait (Reed) ----------------------------------------------------------------------

Marcin Kasperski <Marcin.Kasperski@softax.com.pl> writes:
Yeah, I have already looked at the code for the Deferred class -- I like it, it is delightfully small! :-)
Cool -- that is a substantial saving! I did an experiment where I tested these classes:
and found that X objects take up about 40 bytes in memory whereas Y objects take up 180 bytes in memory. This mail: http://mail.python.org/pipermail/python-list/2002-March/135223.html explains that a dictionary takes up 144 bytes in memory and that fits very nicely with the savings I saw with the X and Y classes above. -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.

For the sake of completness: I saved this as http://twistedmatrix.com/trac/ticket/3245

On 2008-05-19, 22:45:26 Martin Geisler <mg@daimi.au.dk> wrote:
"Michal Pasternak" <michal.dtz@gmail.com> writes:
[...]
I don't think that there's anything wrong with this example in terms of being written using Twisted Python.
This is the point where I think, that you answered your question. If you do stuff in parallel, you use more Deferreds and more memory. I will repeat myself now: I would rather try to find other places to optimize VIFF, than reducing memory footprint of Deferred. I don't have a CS degree, I'm not a mathematician, but from what you wrote it seems that maybe it would be worth to try implementing something stack-based (like RPN), than the tree approach you seem to be doing now - of course if having a stack is something you can have using secure multiparty calculations. -- m

Michał Pasternak <michal.dtz@gmail.com> writes:
Okay, thanks for looking at it!
Yes, I guess you are right... :-) The design tries to do everything at once with as much parallism as possible. But since bandwidth is limited, I guess I could achieve the same throughput by scheduling things in a slower rate. Maybe using something like this: class map_parallel(Deferred): def __init__(self, iterable, func, max): Deferred.__init__(self) self.iter = iterable self.func = func self.running = 0 for _ in range(max): self.running += 1 self.run_next(None) def run_next(self, arg): try: e = self.iter.next() e.addCallback(self.func) e.addCallback(self.run_next) except StopIteration: # The iterator is exhausted. self.running -= 1 if self.running == 0 and not self.called: # All callbacks have finished. self.callback(None) # Pass the argument through unchanged. return arg where map_parallel will map a function over a list of Deferreds, but it will only run up to max function applications in parallel. I will experiment with it, and if it works well, then it would be cool to have a class like this in Twisted.
Thanks for the ideas! You could certainly do what you suggest, changing the schedule like that is okay from a security point of view. And if I can delay allocating memory until needed by doing so, then it will definitely be interesting to try. -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.

Martin Geisler wrote:
Something like DeferredQueue? :-) http://twistedmatrix.com/trac/browser/trunk/twisted/internet/defer.py#L950 -- Nicola Larosa - http://www.tekNico.net/ My dream is that in 500 years (or less) humanity will look back at what passes for psychology today in the same way that we now look back on the dark ages, when they were drilling holes in people's heads to drain out the crazy. Oh wait, we were still doing that in the twentieth century (http://en.wikipedia.org/wiki/Lobotomy). - Phillip J. Eby, February 2008

Nicola Larosa <nico@teknico.net> writes:
Hehe, I can see that this does something that looks very similar :-) I think a DeferredQueue is a bit more general than my code -- I think you need some code in addition to a DeferredQueue in order to obtain what I want: limited parallelism. In particular, I guess you need to catch the QueueUnderflow exceptions and then somehow know when it is time to call get() again. -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multi-Party Computation) to Python. See: http://viff.dk/.

* Martin Geisler <mg@daimi.au.dk> [2008-05-21 13:03:30 +0200]:
In particular, I guess you need to catch the QueueUnderflow exceptions and then somehow know when it is time to call get() again.
DeferredSemaphore may be more appropriate. -- mithrandi, i Ainil en-Balandor, a faer Ambar
participants (7)
-
Marcin Kasperski
-
Martin Geisler
-
Michal Pasternak
-
Michał Pasternak
-
Nicola Larosa
-
Steve Holden
-
Tristan Seligmann