[Python-ideas] speeding up shutil.copy*()

Charles-François Natali cf.natali at gmail.com
Sun Mar 3 18:02:57 CET 2013


shutil.copy*() use copyfileobj():
"""
    while 1:
        buf = fsrc.read(length)
        if not buf:
            break
        fdst.write(buf)
"""

This allocates and frees a lot of buffers, and could be optimized with
readinto().
Unfortunately, I don't think we can change copyfileobj(), because it
might be passed objects that don't implement readinto().

By implementing it directly in copyfile() (it would probably be better
to expose it in shutil to make it available to tarfile & Co), there's
a modest improvement:

$ dd if=/dev/zero of=/tmp/foo bs=1M count=100

Without patch:
$ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo',
'/dev/null')"
10 loops, best of 3: 218 msec per loop

With readinto():
$ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo',
'/dev/null')"
10 loops, best of 3: 202 msec per loop

(I'm using /dev/null as target because my hdd is really slow: other
benchmarks are welcome, just beware that /tmp might be tmpfs).

I've also written a dirty patch to use sendfile(). Here, the
improvement is really significant:

With sendfile():
$ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo',
'/dev/null')"
100 loops, best of 3: 5.39 msec per loop

Thoughts?

cf



More information about the Python-ideas mailing list