[Twisted-Python] Implementing streaming (broadcast) TCP servers

I'm working on a server that will be streaming the same data from a source to many clients. It has to be invoked from an HTTP server, so I'm using twisted.web2 (not twisted.web because it also needs to host a WSGI application). In other words, calling GET on a certain path results in the stream as the response, at which point I don't really need any more HTTP functionality. I have it working... currently, the basic process is: I have a class implementing IByteStream, an instance of which is created for each client; it simply keeps a queue of data to be sent, sending one piece with each call to read(), or a sending a Deferred if the queue is empty. Meanwhile, the part of the application that provides the data (a different server, receiving data from a broadcaster) adds the data to all the clients' byte stream queues, whenever it receives a piece of data. Since efficiency is obviously important for a streaming server, I was just wondering if anyone had any efficiency tips. I've tested it with at most 64 concurrent clients. There doesn't seem to be any slowdown from the clients' perspective (the data is coming in at the rate it should), but server-side, CPU and memory usage start increasing pretty quickly. Especially CPU -- with 64 clients, CPU usage peaked at 26.8%, which is not acceptable. Any thoughts? Is there a better way to handle the streaming process as I previously described? - Adam

Adam, In general, if you are streaming different data to each client, it is going to be expensive. There really are only 3 areas to optimize 1) disk IO -- Read in bigger chunks and make sure you flush your buffer. You might want to drop into C and write your buffering or find a good "ring buffer" (big array that you reuse for the stream) so you don't have to allocate and deallocate memory all the time. 2) Network IO -- check what protocol you can use. If you can get away with lossy streaming, UDP or RTCP might be wins. 3) CPU -- Profile your application and see where the time is being spent. You need to be careful about memory copying and allocation. Python normally handles this well, so this probably isn't the problem. OOPS, I just realized that you said the same data to multiple clients. Is the data the same at the same time, meaning the clients are all getting the same data simultaneously? If so, if you have any control over the network, then UDP Multicast can be a big win. You would only be sending out one stream of data and everyone would hear it. Also, I would look at making sure that you are not copying data around much. I must admit that my Knowledge of all the twisted classes that are available is limited, so I don't know what class to use but I would suggest dropping in to the low level classes below web2 since you will need a little more control for this than normal web applications. Carl zmola@acm.org Adam Atlas wrote:
I'm working on a server that will be streaming the same data from a source to many clients. It has to be invoked from an HTTP server, so I'm using twisted.web2 (not twisted.web because it also needs to host a WSGI application). In other words, calling GET on a certain path results in the stream as the response, at which point I don't really need any more HTTP functionality. I have it working... currently, the basic process is: I have a class implementing IByteStream, an instance of which is created for each client; it simply keeps a queue of data to be sent, sending one piece with each call to read(), or a sending a Deferred if the queue is empty. Meanwhile, the part of the application that provides the data (a different server, receiving data from a broadcaster) adds the data to all the clients' byte stream queues, whenever it receives a piece of data.
Since efficiency is obviously important for a streaming server, I was just wondering if anyone had any efficiency tips. I've tested it with at most 64 concurrent clients. There doesn't seem to be any slowdown from the clients' perspective (the data is coming in at the rate it should), but server-side, CPU and memory usage start increasing pretty quickly. Especially CPU -- with 64 clients, CPU usage peaked at 26.8%, which is not acceptable. Any thoughts? Is there a better way to handle the streaming process as I previously described?
- Adam
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python

On 28 Jun 2007, at 09.32, Carl Zmola wrote:
3) CPU -- Profile your application and see where the time is being spent. You need to be careful about memory copying and allocation. Python normally handles this well, so this probably isn't the problem.
Ah, good point. I'll try that.
OOPS, I just realized that you said the same data to multiple clients. Is the data the same at the same time, meaning the clients are all getting the same data simultaneously? If so, if you have any control over the network, then UDP Multicast can be a big win. You would only be sending out one stream of data and everyone would hear it.
It is indeed being sent at the same time, but unfortunately I'm not designing the protocol. I'm implementing an existing one that only runs over HTTP.
Also, I would look at making sure that you are not copying data around much.
I was definitely paying attention to that. I'm using Python's `buffer` type when possible to prevent any duplication of data; ideally, there's only one real copy of it in memory, and a bunch of buffers pointing to it... then it can be garbage collected once it's been sent to all clients.
I must admit that my Knowledge of all the twisted classes that are available is limited, so I don't know what class to use but I would suggest dropping in to the low level classes below web2 since you will need a little more control for this than normal web applications.
I had similar thoughts... I was thinking, once I had the connection established, I could try to bypass web2's IByteStream stuff and try to use transport.write directly. Do you think that would help? It's less runloopy overhead, at least.

On 28 Jun 2007, at 11.44, Adam Atlas wrote:
I had similar thoughts... I was thinking, once I had the connection established, I could try to bypass web2's IByteStream stuff and try to use transport.write directly. Do you think that would help? It's less runloopy overhead, at least.
For the record, this seems to be working. I rewrote it to work like this, and I was able to pump it up to 64 clients again (me broadcasting to the remote server from my own computer, and then starting 64 clients on my own computer) but CPU and memory usage were much better. CPU was standing around 1.5% and memory around 3.6%. (I know these could still be considered SORT OF high, but this is actually a vserver where both of these resources are quite limited -- only 480 MB of the real server's memory.) However, once all the clients disconnect, the memory doesn't seem to go back down to the expected level. Any ideas on where I should start looking to solve this?

Adam Atlas wrote:
However, once all the clients disconnect, the memory doesn't seem to go back down to the expected level. Any ideas on where I should start looking to solve this?
Are you using Python 2.5? It made great progress in giving the memory back to the operating system. -- Nicola Larosa - http://www.tekNico.net/ Se pensavate che Windows Vista vi avrebbe permesso di risparmiarvi l'acquisto di un antivirus, vi sbagliavate. Se pensavate che fosse più sicuro dei vecchi Windows, vi sbagliavate. Se pensate che possa essere più sicuro di Linux o di MacOS X vi sbagliate di grosso. -- Alessandro Bottoni, Febbraio 2007

On 29 Jun 2007, at 00.41, Nicola Larosa wrote:
Adam Atlas wrote:
However, once all the clients disconnect, the memory doesn't seem to go back down to the expected level. Any ideas on where I should start looking to solve this?
Are you using Python 2.5? It made great progress in giving the memory back to the operating system.
So I've heard. I have 2.5, though I've been testing with 2.4, because this is a program I'm intending to distribute, not just for private use. So I want to keep it backwards-compatible with at least 2.4.
participants (3)
-
Adam Atlas
-
Carl Zmola
-
Nicola Larosa