[Python-ideas] Addition to I/O stack: send()

Sun Apr 19 00:56:55 CEST 2009

Hi Giovanni,

> I'll notice that the implementation can be tweaked in different ways:
> 
>  * Some operating systems do support this operation natively (through 
> syscalls like "sendfile"), which is faster because the data does not 
> roundtrips to user space. This is useful eg. for webservers serving files 
> to network.
> 
>  * In the user-space implementation, the buffer size could match whatever 
> buffer already exists for a buffered file reader (eg: BufferedIOBase and 
> derivates).

I think it all sounds like premature optimization. If you have any workload 
where the cost of user/kernel switching or of copying the data measuredly
affects the overall program speed (rather than e.g. the Python interpretation 
overhead), it would be interesting to hear about it.

(I'm not saying that copying stuff is free; actually, the 3.1 I/O stack somewhat
tries to minimize copying of data between the various layers, by using e.g.
memoryviews and readinto(). But I don't think it is worth going to extremes just
to avoid all memory copies. The cost of a single non-optimized 
method call is likely to be higher than memcpy'ing, say, 1024 bytes...)

Right now, it is true there is no way to do this kind of things. There are two
distinct things you are asking for really:
1/ direct access to the buffered data of a BufferedIOBase object without copying
things around
2/ automagical optimization of file descriptor-to-file descriptor copying
through something like sendfile(), when supported by the OS

Also, please notice that 1 and 2 are exclusive :-) In 1 you are doing buffered
I/O, while in 2 you are doing raw I/O.

If you really need über-fast copy between two file descriptors, the solution
is probably to whip up a trivial C extension that simply wraps around the
sendfile() system call. You could propose it for inclusion in the stdlib, in the
os module, by the way.

Regards

Antoine.