[Python-ideas] speeding up shutil.copy*()

Gregory P. Smith greg at krypto.org
Sun Mar 3 22:38:05 CET 2013


IMNSHO the *time* is less relevant than the fact that it uses less memory
by not repeatedly making copies.

In general we should use the more recent non-copying APIs when possible
within the standard library but most of that code is pretty old and has not
been look at for conversion.  Any such changes are welcome in 3.4+.


On Sun, Mar 3, 2013 at 11:00 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Sun, 3 Mar 2013 19:40:04 +0100
> Charles-François Natali
> <cf.natali at gmail.com> wrote:
> >
> > > sendfile() is a Linux-only syscall. It's also limited to certain kinds
> > > of file descriptors. The limitations have been lifted in recent kernel
> > > versions.
> >
> > No, it's not Linux-only, many BSD also have it, although all don't
> > support an arbitrary output file descriptor (Solaris does allow
> > regular files too). It would be possible to catch EINVAL/EBADF, and
> > fall back to a regular copy loop.
> >
> > Note that the above benchmark is really biased by writing the data to
> > /dev/null: with a real target file, the zero-copy wouldn't bring such
> > a large gain, because the bottleneck will really be the I/O devices
> > (also a read()/write() loop is more expensive in Python than in C).
>
> Can you post your benchmark's code? I could time it on a SSD.
>
> > But I see at least two cases where it could be interesting: when
> > reading/writing from/to a tmpfs partition, or when the source and
> > target files are on different disks.
>
> That's already nice.
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130303/18129270/attachment.html>


More information about the Python-ideas mailing list