Re: [Twisted-Python] Twisted-Python Digest, Vol 67, Issue 22
Hello Valeriy, I tried the thing you suggested, and I attached the (updated) code. Unfortunatly, the new code was even slower, producing the following results: *** Starting Asynchronous Benchmarks. (Using Twisted, with "deferred-decorator") -> Asynchronous Benchmark (1 runs) Completed in 56.0279998779 seconds. -> Asynchronous Benchmark (10 runs) Completed in 56.0130000114 seconds. -> Asynchronous Benchmark (100 runs) Completed in 56.010999918 seconds. -> Asynchronous Benchmark (1000 runs) Completed in 56.0410001278 seconds. -> Asynchronous Benchmark (10000 runs) Completed in 56.3069999218 seconds. -> Asynchronous Benchmark (100000 runs) Completed in 58.8910000324 seconds. *** Asynchronous Benchmarks Completed in 59.4659998417 seconds. I suspect that this would me more inefficient because with the deferToThread function in place, every single operation will be executed in its own thread, which means: (1 x 2) + (10 x 2) + (100 x 2) + (1000 x 2) + (10000 x 2) + (100000 x 2) threads....which is...a lot. Maybe the problem lies in the way I test the code? I understand that using the asynchronous testcode this way (generating the deferreds using a FOR-loop), a lot of deferreds are generated before the reactor starts calling the deferred-callbacks.....would there be another, better way to test the code? The reason I need to now which one is faster (async vs sync functions) is because I need to decide on whetehr or not I should re-evaluate the code I just recently finished building. Any other ideas maybe? Thanks in advance, Dirk ________________________________________________________________________________________________________________________________________________________
Message: 3 Date: Tue, 13 Oct 2009 09:41:19 -0400 From: Valeriy Pogrebitskiy
Subject: Re: [Twisted-Python] Twisted Python vs. "Blocking" Python: Weird performance on small operations. To: Twisted general discussion Message-ID: Content-Type: text/plain; charset="us-ascii" Dirk,
Using deferred directly in your bin2intAsync() may be somewhat less efficient than some other way described in Recipe 439358: [Twisted] From blocking functions to deferred functions
recipe (http://code.activestate.com/recipes/439358/)
You would get same effect (asynchronous execution) - but potentially more efficiently - by just decorating your synchronous methods as:
from twisted.internet.threads import deferToThread deferred = deferToThread.__get__ .... @deferred def int2binAsync(anInteger): #Packs an integer, result is 4 bytes return struct.pack("i", anInteger)
@deferred def bin2intAsync(aBin): #Unpacks a bytestring into an integer return struct.unpack("i", aBin)[0]
Kind regards,
Valeriy Pogrebitskiy vpogrebi@verizon.net
On Oct 13, 2009, at 9:18 AM, Dirk Moors wrote:
Hello Everyone!
My name is Dirk Moors, and since 4 years now, I've been involved in developing a cloud computing platform, using Python as the programming language. A year ago I discovered Twisted Python, and it got me very interested, upto the point where I made the decision to convert our platform (in progress) to a Twisted platform. One year later I'm still very enthousiastic about the overal performance and stability, but last week I encountered something I did't expect;
It appeared that it was less efficient to run small "atomic" operations in different deferred-callbacks, when compared to running these "atomic" operations together in "blocking" mode. Am I doing something wrong here?
To prove the problem to myself, I created the following example (Full source- and test code is attached):
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
import struct
def int2binAsync(anInteger): def packStruct(i): #Packs an integer, result is 4 bytes return struct.pack("i", i)
d = defer.Deferred() d.addCallback(packStruct)
reactor.callLater(0, d.callback, anInteger)
return d
def bin2intAsync(aBin): def unpackStruct(p): #Unpacks a bytestring into an integer return struct.unpack("i", p)[0]
d = defer.Deferred() d.addCallback(unpackStruct)
reactor.callLater(0, d.callback, aBin) return d
def int2binSync(anInteger): #Packs an integer, result is 4 bytes return struct.pack("i", anInteger)
def bin2intSync(aBin): #Unpacks a bytestring into an integer return struct.unpack("i", aBin)[0]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
While running the testcode I got the following results:
(1 run = converting an integer to a byte string, converting that byte string back to an integer, and finally checking whether that last integer is the same as the input integer.)
*** Starting Synchronous Benchmarks. (No Twisted => "blocking" code) -> Synchronous Benchmark (1 runs) Completed in 0.0 seconds. -> Synchronous Benchmark (10 runs) Completed in 0.0 seconds. -> Synchronous Benchmark (100 runs) Completed in 0.0 seconds. -> Synchronous Benchmark (1000 runs) Completed in 0.00399994850159 seconds. -> Synchronous Benchmark (10000 runs) Completed in 0.0369999408722 seconds. -> Synchronous Benchmark (100000 runs) Completed in 0.362999916077 seconds. *** Synchronous Benchmarks Completed in 0.406000137329 seconds.
*** Starting Asynchronous Benchmarks . (Twisted => "non-blocking" code) -> Asynchronous Benchmark (1 runs) Completed in 34.5090000629 seconds. -> Asynchronous Benchmark (10 runs) Completed in 34.5099999905 seconds. -> Asynchronous Benchmark (100 runs) Completed in 34.5130000114 seconds. -> Asynchronous Benchmark (1000 runs) Completed in 34.5859999657 seconds. -> Asynchronous Benchmark (10000 runs) Completed in 35.2829999924 seconds. -> Asynchronous Benchmark (100000 runs) Completed in 41.492000103 seconds. *** Asynchronous Benchmarks Completed in 42.1460001469 seconds.
Am I really seeing factor 100x??
I really hope that I made a huge reasoning error here but I just can't find it. If my results are correct then I really need to go and check my entire cloud platform for the places where I decided to split functions into atomic operations while thinking that it would actually improve the performance while on the contrary it did the opposit.
I personaly suspect that I lose my cpu-cycles to the reactor scheduling the deferred-callbacks. Would that assumption make any sense? The part where I need these conversion functions is in marshalling/ protocol reading and writing throughout the cloud platform, which implies that these functions will be called constantly so I need them to be superfast. I always though I had to split the entire marshalling process into small atomic (deferred-callback) functions to be efficient, but these figures tell me otherwise.
I really hope someone can help me out here.
Thanks in advance, Best regards, Dirk Moors
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Dirk, I hope you are using 'twisted.trial.unittest' instead of standard Python's 'unittest'... Right? In case this is not so - update your test script to use Twisted's unittest module. Kind regards, Valeriy Pogrebitskiy vpogrebi@verizon.net On Oct 13, 2009, at 10:18 AM, Dirk Moors wrote:
Hello Valeriy,
I tried the thing you suggested, and I attached the (updated) code. Unfortunatly, the new code was even slower, producing the following results:
*** Starting Asynchronous Benchmarks. (Using Twisted, with "deferred- decorator") -> Asynchronous Benchmark (1 runs) Completed in 56.0279998779 seconds. -> Asynchronous Benchmark (10 runs) Completed in 56.0130000114 seconds. -> Asynchronous Benchmark (100 runs) Completed in 56.010999918 seconds. -> Asynchronous Benchmark (1000 runs) Completed in 56.0410001278 seconds. -> Asynchronous Benchmark (10000 runs) Completed in 56.3069999218 seconds. -> Asynchronous Benchmark (100000 runs) Completed in 58.8910000324 seconds. *** Asynchronous Benchmarks Completed in 59.4659998417 seconds.
I suspect that this would me more inefficient because with the deferToThread function in place, every single operation will be executed in its own thread, which means: (1 x 2) + (10 x 2) + (100 x 2) + (1000 x 2) + (10000 x 2) + (100000 x 2) threads....which is...a lot.
Maybe the problem lies in the way I test the code? I understand that using the asynchronous testcode this way (generating the deferreds using a FOR-loop), a lot of deferreds are generated before the reactor starts calling the deferred-callbacks.....would there be another, better way to test the code? The reason I need to now which one is faster (async vs sync functions) is because I need to decide on whetehr or not I should re- evaluate the code I just recently finished building.
Any other ideas maybe?
Thanks in advance, Dirk
________________________________________________________________________________________________________________________________________________________ Message: 3 Date: Tue, 13 Oct 2009 09:41:19 -0400 From: Valeriy Pogrebitskiy
Subject: Re: [Twisted-Python] Twisted Python vs. "Blocking" Python: Weird performance on small operations. To: Twisted general discussion Message-ID: Content-Type: text/plain; charset="us-ascii" Dirk,
Using deferred directly in your bin2intAsync() may be somewhat less efficient than some other way described in Recipe 439358: [Twisted] From blocking functions to deferred functions
recipe (http://code.activestate.com/recipes/439358/)
You would get same effect (asynchronous execution) - but potentially more efficiently - by just decorating your synchronous methods as:
from twisted.internet.threads import deferToThread deferred = deferToThread.__get__ .... @deferred def int2binAsync(anInteger): #Packs an integer, result is 4 bytes return struct.pack("i", anInteger)
@deferred def bin2intAsync(aBin): #Unpacks a bytestring into an integer return struct.unpack("i", aBin)[0]
Kind regards,
Valeriy Pogrebitskiy vpogrebi@verizon.net
On Oct 13, 2009, at 9:18 AM, Dirk Moors wrote:
Hello Everyone!
My name is Dirk Moors, and since 4 years now, I've been involved in developing a cloud computing platform, using Python as the programming language. A year ago I discovered Twisted Python, and it got me very interested, upto the point where I made the decision to convert our platform (in progress) to a Twisted platform. One year later I'm still very enthousiastic about the overal performance and stability, but last week I encountered something I did't expect;
It appeared that it was less efficient to run small "atomic" operations in different deferred-callbacks, when compared to running these "atomic" operations together in "blocking" mode. Am I doing something wrong here?
To prove the problem to myself, I created the following example (Full source- and test code is attached):
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
import struct
def int2binAsync(anInteger): def packStruct(i): #Packs an integer, result is 4 bytes return struct.pack("i", i)
d = defer.Deferred() d.addCallback(packStruct)
reactor.callLater(0, d.callback, anInteger)
return d
def bin2intAsync(aBin): def unpackStruct(p): #Unpacks a bytestring into an integer return struct.unpack("i", p)[0]
d = defer.Deferred() d.addCallback(unpackStruct)
reactor.callLater(0, d.callback, aBin) return d
def int2binSync(anInteger): #Packs an integer, result is 4 bytes return struct.pack("i", anInteger)
def bin2intSync(aBin): #Unpacks a bytestring into an integer return struct.unpack("i", aBin)[0]
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
While running the testcode I got the following results:
(1 run = converting an integer to a byte string, converting that byte string back to an integer, and finally checking whether that last integer is the same as the input integer.)
*** Starting Synchronous Benchmarks. (No Twisted => "blocking" code) -> Synchronous Benchmark (1 runs) Completed in 0.0 seconds. -> Synchronous Benchmark (10 runs) Completed in 0.0 seconds. -> Synchronous Benchmark (100 runs) Completed in 0.0 seconds. -> Synchronous Benchmark (1000 runs) Completed in 0.00399994850159 seconds. -> Synchronous Benchmark (10000 runs) Completed in 0.0369999408722 seconds. -> Synchronous Benchmark (100000 runs) Completed in 0.362999916077 seconds. *** Synchronous Benchmarks Completed in 0.406000137329 seconds.
*** Starting Asynchronous Benchmarks . (Twisted => "non-blocking" code) -> Asynchronous Benchmark (1 runs) Completed in 34.5090000629 seconds. -> Asynchronous Benchmark (10 runs) Completed in 34.5099999905 seconds. -> Asynchronous Benchmark (100 runs) Completed in 34.5130000114 seconds. -> Asynchronous Benchmark (1000 runs) Completed in 34.5859999657 seconds. -> Asynchronous Benchmark (10000 runs) Completed in 35.2829999924 seconds. -> Asynchronous Benchmark (100000 runs) Completed in 41.492000103 seconds. *** Asynchronous Benchmarks Completed in 42.1460001469 seconds.
Am I really seeing factor 100x??
I really hope that I made a huge reasoning error here but I just can't find it. If my results are correct then I really need to go and check my entire cloud platform for the places where I decided to split functions into atomic operations while thinking that it would actually improve the performance while on the contrary it did the opposit.
I personaly suspect that I lose my cpu-cycles to the reactor scheduling the deferred-callbacks. Would that assumption make any sense? The part where I need these conversion functions is in marshalling/ protocol reading and writing throughout the cloud platform, which implies that these functions will be called constantly so I need them to be superfast. I always though I had to split the entire marshalling process into small atomic (deferred-callback) functions to be efficient, but these figures tell me otherwise.
I really hope someone can help me out here.
Thanks in advance, Best regards, Dirk Moors
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
participants (2)
-
Dirk Moors
-
Valeriy Pogrebitskiy