[Python-ideas] speeding up shutil.copy*()
Serhiy Storchaka
storchaka at gmail.com
Mon Mar 4 17:18:17 CET 2013
On 03.03.13 19:02, Charles-François Natali wrote:
> Without patch:
> $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo',
> '/dev/null')"
> 10 loops, best of 3: 218 msec per loop
>
> With readinto():
> $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo',
> '/dev/null')"
> 10 loops, best of 3: 202 msec per loop
8%. Note that in real cases the difference will be significant less.
First, output to real file requires more time than output to /dev/null.
Second, you unlikely copy the same input file 30 times in a row. Only
first time in the test you read from disk, and 29 times you read from
cache. Third, such sources as tarfile has several level between user
code and disk file. BufferedIO, GzipFile, internal tarfile wrapper.
Every level adds some overhead and in sum this will be many times larger
then creating of one bytes object.
> With sendfile():
> $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo',
> '/dev/null')"
> 100 loops, best of 3: 5.39 msec per loop
This looks more interesting.
There are other idea to speedup tarfile extracting. Use dir_fd parameter
(if it is available) for opening of target files. It can speedup
extracting of a large count of small and deeply nested files. sendfile()
should speedup extracting only large files.
More information about the Python-ideas
mailing list