Writev

Adam DePrince adam at cognitcorp.com
Sun Dec 19 23:12:27 EST 2004


Often I've written code that generates some textual output and dumps
said output to a file or socket.   Of course, we are all familiar with
the performance impact (multiple copies of text data and O(n**2)
performance) associated with the naive approach of:

string += "some other string" 
string += "another string" 

Many other programmers have faced a similar issue; cStringIO,
''.join([mydata]), map( file.write, [mydata]) are but some attempts at
making this process more efficient by jamming the components to be
written into a sequence.

These approaches have their drawbacks.  

cStringIO and ''.join involve an extra copy of the string; map(
file.write ... potentially makes an excessive number of os write calls. 
With the maturation of iterators in Python comes the consideration that
cStringIO and ''.join store the entire output in memory. 
map(file.write, while allowing for the emission of data to a file or
socket while being generated, requires a lot of os calls.   Even without
consideration for iterators, the elimination of the final concatenation
or drastic reduction in the number of os calls (context switches hurt)
could be a substantial benefit.

Perusing through the posix module reveals that the posix writev call is
not exposed.  Writev is the POSIX answer to this problem for very
similar reasons.   The ability to expose a list of strings to be
outputted to the hardware level would allow for the exploitation of some
rather sophisticated hardware.  IIRC, some operating systems (BSD I
believe) have "zero copy" network code; it is conceivable that the final
concatenation of output could be entirely avoided given smart enough
drivers and hardware.  

Of course, to take advantage of this requires that writev be exposed.  I
have an implementation of writev.  This implementation is reasonably
smart, it "unrolls" only so as many iteration.next calls as necessary to
fill the pointer list for the next invocation of writev.   Of course, on
many systems (Linux, BSD) writev's limit is 1024 items, so to
accommodate users who don't want to peel off and hold in memory 1024
iteration.next()'s, you can set an optional parameter to a smaller
value. 

I'm not sure where to take this as a "next step."  It seems too small a
change for a PEP.  Any ideas?

You can download the patch from
http://deprince.net/software/writev/index.html


Adam DePrince 




More information about the Python-list mailing list