[Twisted-Python] Advice on doing thousands of simultaneous UDP queries with Twisted...
I've written a UDP-based protocol adaptor (TwistedSNMP) where one of the key requirements is the ability to scan thousands of SNMP Agents "simultaneously" (i.e. the requests should be asynchronously sent and retired as the Agents respond). http://members.rogers.com/mcfletch/programming/index.htm#TwistedSNMP Writing a simple asynchronous loop myself (poll on a simple socket, send message from queue when writable, read one into other queue when readable) allowed for doing a few thousand queries simultaneously), with only a few dozen dropped messages. However, with the Twisted equivalent (UDPTransport with my simple protocol object), I was seeing huge drop rates, so, gathering that Twisted isn't queueing up the UDP requests, I wrote a (byzantine) query-throttling mechanism with Twisted defers. Problem is, it's a byzantine, fragile (and *slow*) solution to what would *seem* to be one of the most common requirements in networked development. Worse yet, because I am seeing such high drop rates I wind up having to batch in very small groups, serially (instead of parallel-ly), so the primary purpose of the system (fast querying of thousands of agents) is lost. (Instead of taking 1 or 2 minutes to query 800 or so Agents it will take on the order of 10 minutes.) So, the question: is there a simple way to turn on buffered mode in UDP transports so that they can deal with queueing up a few thousand messages to send, sending them, then having a few thousand computers send a reply (within a few seconds of one another)? Is Twisted just not Queue-reliant via use of some mechanism I haven't discovered yet? Even if I do find a decent queueing mechanism, I'm still left with the problem that timeouts and the like are going to wind up being measured from queueing-time, rather than sending time... not an issue if everything gets sent in a half-second or so, but a real problem if it takes 8 or 9 seconds just to send the original messages out. Looking at the udp.Port class, I'm not seeing anything providing a queue, seems as though there's a non-blocking write or read, but nothing to handle overflows of sends or receives AFAICT, though it looks as though a protocol could do some queueing on incoming in its datagramReceived... just not sure how that would work. Thoughts appreciated, Mike _______________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://members.rogers.com/mcfletch/
Have a look at htb.py, you should be able to rate limit the udp throughput rather easily. For me doing web stuff it was (in its entirety) like this: def RateBucket(parent=None): bucket = htb.Bucket(parent) bucket.rate = 4000 bucket.maxburst = 12000 return bucket bucketFilter = htb.FilterByHost() bucketFilter.bucketFactory = RateBucket application = service.Application("simple") site_factory = server.Site( Directory() ) internet.TCPServer( SERVER_PORT, site_factory ).setServiceParent(application) site_factory.protocol = htb.ShapedProtocolFactory( site_factory.protocol, bucketFilter ) Hope that helps. Stephen Thorne On Mon, 2004-02-09 at 20:51, Mike C. Fletcher wrote:
I've written a UDP-based protocol adaptor (TwistedSNMP) where one of the key requirements is the ability to scan thousands of SNMP Agents "simultaneously" (i.e. the requests should be asynchronously sent and retired as the Agents respond).
http://members.rogers.com/mcfletch/programming/index.htm#TwistedSNMP
Writing a simple asynchronous loop myself (poll on a simple socket, send message from queue when writable, read one into other queue when readable) allowed for doing a few thousand queries simultaneously), with only a few dozen dropped messages. However, with the Twisted equivalent (UDPTransport with my simple protocol object), I was seeing huge drop rates, so, gathering that Twisted isn't queueing up the UDP requests, I wrote a (byzantine) query-throttling mechanism with Twisted defers.
Problem is, it's a byzantine, fragile (and *slow*) solution to what would *seem* to be one of the most common requirements in networked development. Worse yet, because I am seeing such high drop rates I wind up having to batch in very small groups, serially (instead of parallel-ly), so the primary purpose of the system (fast querying of thousands of agents) is lost. (Instead of taking 1 or 2 minutes to query 800 or so Agents it will take on the order of 10 minutes.)
So, the question: is there a simple way to turn on buffered mode in UDP transports so that they can deal with queueing up a few thousand messages to send, sending them, then having a few thousand computers send a reply (within a few seconds of one another)? Is Twisted just not Queue-reliant via use of some mechanism I haven't discovered yet? Even if I do find a decent queueing mechanism, I'm still left with the problem that timeouts and the like are going to wind up being measured from queueing-time, rather than sending time... not an issue if everything gets sent in a half-second or so, but a real problem if it takes 8 or 9 seconds just to send the original messages out.
Looking at the udp.Port class, I'm not seeing anything providing a queue, seems as though there's a non-blocking write or read, but nothing to handle overflows of sends or receives AFAICT, though it looks as though a protocol could do some queueing on incoming in its datagramReceived... just not sure how that would work.
Thoughts appreciated, Mike
_______________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://members.rogers.com/mcfletch/
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Stephen Thorne wrote:
Have a look at htb.py, you should be able to rate limit the udp throughput rather easily.
For me doing web stuff it was (in its entirety) like this:
<bandwidth throttler clipped>
Hope that helps.
If for no other reason than that it convinced me there wasn't a simple and elegant solution readily available to be dropped in for a UDP system :) . In the final analysis, what was happening is that individual queries were timing out *before the queries were even sent* because it was taking so incredibly long to format and send all of the queries. The manually written system was "pull" based. It marked the time-of-sending for timeout purposes deep in the select loop. The Twisted version is setting the timeout via a callLater when the request is *submitted*. With 2 or 3 thousand queries, the 20 seconds of timeout was finishing before any messages were processed from the incoming port, then the timeouts would all get called, which kept eating up processing time, preventing any messages from getting read, so the port's internal buffer would fill up and more timeouts occurred, so basically nothing would get through. Queueing up the outputs before sending didn't help, btw. Nor would simple bandwidth throttling without modifying the timeout characteristics. My solution, which is rather low tech, is to do an effective "yield" after each message is sent, to allow for processing incoming responses. This is done with this little method: def smallBatch( self, oids, tables, index=0, iterDelay=.01 ): if index < len(self.proxies): proxy = self.proxies[index] self.singleProxy( proxy,oids,tables ) reactor.callLater( iterDelay, self.smallBatch, oids, tables, index+1 ) else: dl = defer.DeferredList( self.partialDefers ) dl.addCallback( self.returnFinal ) (which also has a side-effect of introducing a bandwidth throttling), but most importantly, it delays setting the timeout on any given query until when the query is actually sent (or fairly close to then). Put another way, it gives the system a chance to send the message as soon after submitting it as possible. Have fun all, and thanks, Stephen, Mike
Stephen Thorne
On Mon, 2004-02-09 at 20:51, Mike C. Fletcher wrote:
...
only a few dozen dropped messages. However, with the Twisted equivalent (UDPTransport with my simple protocol object), I was seeing huge drop rates, so, gathering that Twisted isn't queueing up the UDP requests, I wrote a (byzantine) query-throttling mechanism with Twisted defers.
...
_______________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://members.rogers.com/mcfletch/
Mike C. Fletcher wrote:
If for no other reason than that it convinced me there wasn't a simple and elegant solution readily available to be dropped in for a UDP system :) . In the final analysis, what was happening is that individual queries were timing out *before the queries were even sent* because it was taking so incredibly long to format and send all of the queries.
You bring my point with SNMP : I've tried to build a SNMP poller based on twisted and pySNMP. Main trouble i've had is the decoding time of the pure python ASN parser. It spent about 1 full second decoding (packets are about 7ko, as i get all values and tables with a GET-BULK, there is room for improvement by splitting requests, but that's not significant for that matter). So i switched to a threaded implementation using yapsnmp (and ugly deferToThreads). Decoding times are about 0.02 seconds for the same requests. Load on the server was more than halved. Seems that you are hit by the same kind of problems... I've dug the source of UCD-SNMP but was not able to see a simple hack to integrate the UCD packet decoding and the twisted loop, especially with the parts dealing with SNMPv3 (UCD- and NET-snmp almost forces you to use their select wrapper...) Moreover, the last yapsnmp release is based on v4.3 so a little out of date. As my solution works for my (smaller) needs, i currently have no plan to improve my solution, but if anyone is interested in bringing a full high performance ASN coder-decoder to twisted, i'd be willing to help. -- Thomas FAVIER .accelance msp Tel. +33 4 26 29 12 22 http://www.accelance.fr
Thomas Favier wrote:
Mike C. Fletcher wrote:
If for no other reason than that it convinced me there wasn't a simple and elegant solution readily available to be dropped in for a UDP system :) . In the final analysis, what was happening is that individual queries were timing out *before the queries were even sent* because it was taking so incredibly long to format and send all of the queries.
You bring my point with SNMP : I've tried to build a SNMP poller based on twisted and pySNMP. Main trouble i've had is the decoding time of the pure python ASN parser. It spent about 1 full second decoding (packets are about 7ko, as i get all values and tables with a GET-BULK, there is room for improvement by splitting requests, but that's not significant for that matter). So i switched to a threaded implementation using yapsnmp (and ugly deferToThreads). Decoding times are about 0.02 seconds for the same requests. Load on the server was more than halved.
Seems that you are hit by the same kind of problems...
...
but if anyone is interested in bringing a full high performance ASN coder-decoder to twisted, i'd be willing to help.
Ilya (creator of PySNMP) just released a new version of PySNMP yesterday. It's noticeably faster feeling than the 3.3.x version. I don't have time to do timings with it, but I'd guess it's probably taking 3/4 of the time of the previous version. Still likely not 0.02 seconds on getbulk requests, but noticeably faster. Profiling the ASN parser and writing a small accelerator module would likely be quite doable. From what I've seen there's not a lot of complex code in there. It's just hidden deep in the package structure. Nice thing about that approach is that it's possible to port to new platforms and only worry about the C module being present if you need the speed there. My primary interest here isn't really in speed, we query 8000 agents at a time, and only need to query once every few *hours*. The flexibility & robustness of the code is my primary interest. Too much C gets me worried :) , particularly with networking code, where buffer overflows and the like are a pain to avoid. Have fun, Mike _______________________________________ Mike C. Fletcher Designer, VR Plumber, Coder http://members.rogers.com/mcfletch/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Ilya (creator of PySNMP) just released a new version of PySNMP yesterday.
As a data point, TwistedSNMP just appearead on Daily Python-URL: Two Python SNMP packages updated http://www.pycs.net/users/0000231/weblog/2004/02/05.html - -- Nicola Larosa - nico-NoSp@m-tekNico.net "Hope" is the thing with feathers - That perches in the soul - And sings the tune without the words - And never stops - at all - -- Emily Dickinson -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFAKLuGXv0hgDImBm4RAmfeAKCmLWv313qygJ2GLj5vqkPe/7UnxACdGt8Q fzPd1AEB5PE6HvGVZhsF50E= =S9cz -----END PGP SIGNATURE----- -- Nicola Larosa - nico@tekNico.net "Hope" is the thing with feathers - That perches in the soul - And sings the tune without the words - And never stops - at all - -- Emily Dickinson
but if anyone is interested in bringing a full high performance ASN coder-decoder to twisted, i'd be willing to help.
Ilya (creator of PySNMP) just released a new version of PySNMP yesterday. It's noticeably faster feeling than the 3.3.x version. I don't have time to do timings with it, but I'd guess it's probably taking 3/4 of the time of the previous version. Still likely not 0.02 seconds on getbulk requests, but noticeably faster.
Just a little bit of clarification: My ASN1 decoder is designed in a way similar to a top-down text parser -- it reads octet stream (text) and builds a tree of ASN1 objects (AST). Depending of the mix-ins to those ASN1 objects, walking the tree may produce something concrete. Perhaps the most noticable bottleneck there lays in the fact that building a tree of ASN1 objects involves massive classes instantiations. In the latest pysnmp code (3.4.x) I attempted to save on object creation by caching and reusing once created ASN1 objects. To make the most of this trick I'd suggest to cache and re-use ASN1 objects (SNMP message objects for example) at your application whenever possible. -ilya
Ilya Etingof wrote:
In the latest pysnmp code (3.4.x) I attempted to save on object creation by caching and reusing once created ASN1 objects.
To make the most of this trick I'd suggest to cache and re-use ASN1 objects (SNMP message objects for example) at your application whenever possible.
Hmm... http://151.200.36.43:8080/evil/freelist.py Courtesy of Jp Calderone. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://radix.twistedmatrix.com/
Thomas Favier wrote:
You bring my point with SNMP : I've tried to build a SNMP poller based on twisted and pySNMP. Main trouble i've had is the decoding time of the pure python ASN parser. It spent about 1 full second decoding (packets are about 7ko, as i get all values and tables with a GET-BULK, there is room for improvement by splitting requests, but that's not significant for that matter). So i switched to a threaded implementation using yapsnmp (and ugly deferToThreads). Decoding times are about 0.02 seconds for the same requests. Load on the server was more than halved.
Considered doing some profiling, then using psyco or pyrex to remove the bottlenecks? This seems, to me, like something they'd be great at.
On Tue, Feb 10, 2004 at 10:03:56AM +0100, Thomas Favier wrote:
Mike C. Fletcher wrote:
You bring my point with SNMP : I've tried to build a SNMP poller based on twisted and pySNMP. Main trouble i've had is the decoding time of the pure python ASN parser. It spent about 1 full second decoding (packets are about 7ko, as i get all values and tables with a GET-BULK, there is room for improvement by splitting requests, but that's not significant for that matter). So i switched to a threaded implementation using yapsnmp (and ugly deferToThreads). Decoding times are about 0.02 seconds for the same requests. Load on the server was more than halved.
Please forgive the self-linking, but you might like to take a look at libsnmp: http://seafelt.unicity.com.au/libsnmp. I wrote it to replace pySNMP in a product we're working on, for various reasons, speed being one of them. The main difference between the two is that libsnmp isn't a generic ASN.1 decoder; we've hardcoded things specific to SNMP to make things faster. It seems to scale quite well so far. -- "You've got that exactly backwards. Windows was a boot loader that was mutilated and tortured by sadistic psychopaths of the highest order in order to make it superficially resemble an OS." -- Paul Tomblin in a.s.r
On Mon, 2004-02-09 at 05:51, Mike C. Fletcher wrote:
Writing a simple asynchronous loop myself (poll on a simple socket, send message from queue when writable, read one into other queue when readable) allowed for doing a few thousand queries simultaneously), with only a few dozen dropped messages. However, with the Twisted equivalent (UDPTransport with my simple protocol object), I was seeing huge drop rates, so, gathering that Twisted isn't queueing up the UDP requests,
Possibly we need some generic UDP queuing, rather than current system. Patches are welcome. -- Itamar Shtull-Trauring http://itamarst.org Looking for a job: http://itamarst.org/resume.html
Mike C. Fletcher wrote:
Writing a simple asynchronous loop myself (poll on a simple socket, send message from queue when writable, read one into other queue when readable) allowed for doing a few thousand queries simultaneously), with
A word of warning here. Nothing prevents your outgoing socket from claiming writeability all the time, and the OS from dropping your packets due to lack of buffer space. UDP just isn't reliable. Been there, got the T-shirt. You might be able to do some hacks with SIOCOUTQ, but even that depends on the OS. IIRC on linux 2.4 it's not useful for UDP. Your best course of action is to avoid bursts. Here's a nice and simple algorithm: use a token bucket filter just like suggested elsewhere in this thred, but add tokens both per time and per received responses. Don't actually use the TBF as a filter, or even queue packets before it; use it to see when you should be generating more requests and when not. As long as there are tokens in bucket, generate reply and send it. When bucket is empty, return and try again later. With reactor.callLater(), add tokens to bucket and do the above processing. When receiving a packet, add a token to bucket and do the above processing. The more you get replies, the more work gets done; dropped packets slow things down, as they should. Time-based token adding is there mostly to be robust against packet loss; in normal case replies should be the ones filling the bucket. The trick is to continue work as soon as possible, based on replies (instead of waiting for all n request to complete before continuing, or until a timer expires).
participants (10)
-
Anthony Baxter
-
Christopher Armstrong
-
Ilya Etingof
-
Itamar Shtull-Trauring
-
Justin Warren
-
Mike C. Fletcher
-
Nicola Larosa
-
Stephen Thorne
-
Thomas Favier
-
Tommi Virtanen