[Python-ideas] speeding up shutil.copy*()

Mon Mar 4 20:21:35 CET 2013

> 8%. Note that in real cases the difference will be significant less. First,
> output to real file requires more time than output to /dev/null. Second, you
> unlikely copy the same input file 30 times in a row. Only first time in the
> test you read from disk, and 29 times you read from cache. Third, such
> sources as tarfile has several level between user code and disk file.
> BufferedIO, GzipFile, internal tarfile wrapper. Every level adds some
> overhead and in sum this will be many times larger then creating of one
> bytes object.

I know, I said it was really biased :-)
The proper way to perform a cold cache benchmark would be "echo 3 >
/proc/sys/vm/drop_caches" before reading the file.
The goal was to highlight the reallocation cost (whose cost can vary
depending on the implementation).

>> With sendfile():
>> $ ./python -m timeit -s "import shutil" "shutil.copyfile('/tmp/foo',
>> '/dev/null')"
>> 100 loops, best of 3: 5.39 msec per loop
>
>
> This looks more interesting.

Not really, because like above, the extra syscalls and copy loops
aren't really the bottleneck, it's still the I/O (try replacing
/dev/null with an on-disk file and the gain plummets: it might be
different if the source and target files are on different disks,
though). Zero-copy really shines when writing data to a socket: a more
interesting usage would be in ftplib & Co.

cf