[Twisted-Python] Sending a long string/buffer without copying it
HI, In one of my Twisted based applications, I need to send large string and buffers. They can be 100's of MB's long (they come from large numpy arrays). I would like to be able to send them *without making any copies* in the process. This seems to be dificult with the way that certain parts of Twisted are written: in protocols.basic many of the sendString/sendLine method having things that make a copy of the string or line to be send: def sendLine(self, line): """Sends a line to the other end of the connection. """ return self.transport.write(line + self.delimiter) If line is 100MB, this just made a second 100MB string. To make things worse, in my case a server needs to send this line to many clients that are connected. The line gets copied for each client! If I have 10 clients, I have nearly a GB worth of extra memory allocated for this temporary copy. This problem is easy solve at the protocol level: you just do separate writes for the delimiter and the line. Or if you are using a length prefixed protocol, write the length bytes and the string separately. BUT.... Even if I do that, it appears that Twisted is making copies elsewhere - like in FileDescriptor.doWrite. So, how can I send something without making a copy? I don't mind making copies of slices, just not the whole thing. Thanks Brian
Brian Granger wrote:
Even if I do that, it appears that Twisted is making copies elsewhere - like in FileDescriptor.doWrite. So, how can I send something without making a copy? I don't mind making copies of slices, just not the whole thing.
You'd need to patch the reactor. Isn't there some talk about a python buffer thingie for this kind of need?
On Thu, 28 Sep 2006 00:11:41 -0600, Brian Granger <ellisonbg.net@gmail.com> wrote:
HI,
In one of my Twisted based applications, I need to send large string and buffers. They can be 100's of MB's long (they come from large numpy arrays). I would like to be able to send them *without making any copies* in the process.
This seems to be dificult with the way that certain parts of Twisted are written:
in protocols.basic many of the sendString/sendLine method having things that make a copy of the string or line to be send:
def sendLine(self, line): """Sends a line to the other end of the connection. """ return self.transport.write(line + self.delimiter)
If line is 100MB, this just made a second 100MB string. To make things worse, in my case a server needs to send this line to many clients that are connected. The line gets copied for each client! If I have 10 clients, I have nearly a GB worth of extra memory allocated for this temporary copy.
This problem is easy solve at the protocol level: you just do separate writes for the delimiter and the line. Or if you are using a length prefixed protocol, write the length bytes and the string separately.
BUT....
Even if I do that, it appears that Twisted is making copies elsewhere - like in FileDescriptor.doWrite. So, how can I send something without making a copy? I don't mind making copies of slices, just not the whole thing.
Don't pass the entire thing to a single call to transport.write() (or LineReceiver.sendLine). Instead, write a producer. Jean-Paul
But a producer will just make sure the whole thing isn't copied at the same time right? It still does many smaller copies - while the memory is saved there is still the performance hit. I just wanted to make sure that I wan't missing something obvious. I think the right way of doing this is to use a true rw buffer, such as those created by numpy. On 9/28/06, Jean-Paul Calderone <exarkun@divmod.com> wrote:
On Thu, 28 Sep 2006 00:11:41 -0600, Brian Granger <ellisonbg.net@gmail.com> wrote:
HI,
In one of my Twisted based applications, I need to send large string and buffers. They can be 100's of MB's long (they come from large numpy arrays). I would like to be able to send them *without making any copies* in the process.
This seems to be dificult with the way that certain parts of Twisted are written:
in protocols.basic many of the sendString/sendLine method having things that make a copy of the string or line to be send:
def sendLine(self, line): """Sends a line to the other end of the connection. """ return self.transport.write(line + self.delimiter)
If line is 100MB, this just made a second 100MB string. To make things worse, in my case a server needs to send this line to many clients that are connected. The line gets copied for each client! If I have 10 clients, I have nearly a GB worth of extra memory allocated for this temporary copy.
This problem is easy solve at the protocol level: you just do separate writes for the delimiter and the line. Or if you are using a length prefixed protocol, write the length bytes and the string separately.
BUT....
Even if I do that, it appears that Twisted is making copies elsewhere - like in FileDescriptor.doWrite. So, how can I send something without making a copy? I don't mind making copies of slices, just not the whole thing.
Don't pass the entire thing to a single call to transport.write() (or LineReceiver.sendLine). Instead, write a producer.
Jean-Paul
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
That's going to happen anyway. The buffers for a socket are a few kilobytes. While it's possible in C to get away with zero copy if you try hard enough, it's not really possible from Python. Your bottleneck is almost certainly going to be I/O, not memcpy. You're worrying too much about what amounts to premature optimization. -bob On 9/28/06, Brian Granger <ellisonbg.net@gmail.com> wrote:
But a producer will just make sure the whole thing isn't copied at the same time right? It still does many smaller copies - while the memory is saved there is still the performance hit.
I just wanted to make sure that I wan't missing something obvious. I think the right way of doing this is to use a true rw buffer, such as those created by numpy.
On 9/28/06, Jean-Paul Calderone <exarkun@divmod.com> wrote:
On Thu, 28 Sep 2006 00:11:41 -0600, Brian Granger <ellisonbg.net@gmail.com> wrote:
HI,
In one of my Twisted based applications, I need to send large string and buffers. They can be 100's of MB's long (they come from large numpy arrays). I would like to be able to send them *without making any copies* in the process.
This seems to be dificult with the way that certain parts of Twisted are written:
in protocols.basic many of the sendString/sendLine method having things that make a copy of the string or line to be send:
def sendLine(self, line): """Sends a line to the other end of the connection. """ return self.transport.write(line + self.delimiter)
If line is 100MB, this just made a second 100MB string. To make things worse, in my case a server needs to send this line to many clients that are connected. The line gets copied for each client! If I have 10 clients, I have nearly a GB worth of extra memory allocated for this temporary copy.
This problem is easy solve at the protocol level: you just do separate writes for the delimiter and the line. Or if you are using a length prefixed protocol, write the length bytes and the string separately.
BUT....
Even if I do that, it appears that Twisted is making copies elsewhere - like in FileDescriptor.doWrite. So, how can I send something without making a copy? I don't mind making copies of slices, just not the whole thing.
Don't pass the entire thing to a single call to transport.write() (or LineReceiver.sendLine). Instead, write a producer.
Jean-Paul
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
I'm probably missing something, but your outgoing bytes have to be copied at least once, anyway, into the kernel's address space (unless you're using some kind of trick like sendfile()). Clearly creating a duplicate of the entire multi-MB string would be bad. But as long as the Producer kept returning the same, say, 4KB buffer (I'm assuming that's possible?), you're only talking about one extra copy per write. That *might* still be significant, but without measuring it it would be very hard to say. Just my $.02, -- Jacob -----Original Message----- From: twisted-python-bounces@twistedmatrix.com [mailto:twisted-python-bounces@twistedmatrix.com] On Behalf Of Brian Granger Sent: Thursday, September 28, 2006 3:48 PM To: Twisted general discussion Subject: Re: [Twisted-Python] Sending a long string/buffer without copying it But a producer will just make sure the whole thing isn't copied at the same time right? It still does many smaller copies - while the memory is saved there is still the performance hit. I just wanted to make sure that I wan't missing something obvious. I think the right way of doing this is to use a true rw buffer, such as those created by numpy. On 9/28/06, Jean-Paul Calderone <exarkun@divmod.com> wrote:
On Thu, 28 Sep 2006 00:11:41 -0600, Brian Granger <ellisonbg.net@gmail.com> wrote:
HI,
In one of my Twisted based applications, I need to send large string and buffers. They can be 100's of MB's long (they come from large numpy arrays). I would like to be able to send them *without making any copies* in the process.
This seems to be dificult with the way that certain parts of Twisted are written:
in protocols.basic many of the sendString/sendLine method having things that make a copy of the string or line to be send:
def sendLine(self, line): """Sends a line to the other end of the connection. """ return self.transport.write(line + self.delimiter)
If line is 100MB, this just made a second 100MB string. To make things worse, in my case a server needs to send this line to many clients that are connected. The line gets copied for each client! If I have 10 clients, I have nearly a GB worth of extra memory allocated for this temporary copy.
This problem is easy solve at the protocol level: you just do separate writes for the delimiter and the line. Or if you are using a length prefixed protocol, write the length bytes and the string separately.
BUT....
Even if I do that, it appears that Twisted is making copies elsewhere - like in FileDescriptor.doWrite. So, how can I send something without making a copy? I don't mind making copies of slices, just not the whole thing.
Don't pass the entire thing to a single call to transport.write() (or LineReceiver.sendLine). Instead, write a producer.
Jean-Paul
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
Yes, if the sockets have buffers of a few k, then using a producer should be fine. I will try this out. Thanks! Brian
participants (5)
-
Bob Ippolito -
Brian Granger -
Jacob Gabrielson -
Jean-Paul Calderone -
Phil Mayers