[Python-ideas] speeding up shutil.copy*()

Mon Mar 4 17:18:17 CET 2013

On 03.03.13 19:02, Charles-François Natali wrote:
> Without patch:
> $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo',
> '/dev/null')"
> 10 loops, best of 3: 218 msec per loop
>
> With readinto():
> $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo',
> '/dev/null')"
> 10 loops, best of 3: 202 msec per loop

8%. Note that in real cases the difference will be significant less. 
First, output to real file requires more time than output to /dev/null. 
Second, you unlikely copy the same input file 30 times in a row. Only 
first time in the test you read from disk, and 29 times you read from 
cache. Third, such sources as tarfile has several level between user 
code and disk file. BufferedIO, GzipFile, internal tarfile wrapper. 
Every level adds some overhead and in sum this will be many times larger 
then creating of one bytes object.

> With sendfile():
> $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo',
> '/dev/null')"
> 100 loops, best of 3: 5.39 msec per loop

This looks more interesting.

There are other idea to speedup tarfile extracting. Use dir_fd parameter 
(if it is available) for opening of target files. It can speedup 
extracting of a large count of small and deeply nested files. sendfile() 
should speedup extracting only large files.