shutil zero-copy and exotic filesystems

Hello, I've been working on a patch which speeds up shutil.copy* operations for all 3 major platforms (Linux, Windows, OSX): https://bugs.python.org/issue33671 Since the speedup is quite consistent I'd love to see this merged in, but considering shutil.copy* is quite crucial I wanted to hear other folk's opinion first. Attached patch attempts to use platform-specific zero-copy syscalls [1] by default and fallbacks on using plain read() / write() variant in case of immediate failure. In theory this should work fine, in practice I haven't tested it on exotic (e.g. network) filesystems. In order to diminish risks of breakage I think if it would make sense to: - add a global shutil.NO_ZEROCOPY variable defaulting to False - add a "no_zerocopy" argument to all functions involving a copy (copyfile(). copy(), copy2(), copytree(), move()) Thoughts? [1] sendfile() (Linux), fcopyfile() (OSX), CopyFileW (Windows) since the matter is a bit sensitive in terms of potential breakage on exotic / untested (e.g. network) filesystems I want to raise some attention about: https://bugs.python.org/issue33671 Attached patch attempts to use platform-specific zero-copy syscalls by default and fallbacks on using plain read() / write() copy in case of immediate failure. In order to diminish risks I think it would make sense to: - add a global shutil.NO_ZEROCOPY variable defaulting to False - add a "no_zerocopy" argument to all functions involving a copy (copyfile(). copy(), copy2(), copytree(), move()) Thoughts? -- Giampaolo - http://grodola.blogspot.com

Whops, I hit "send" too soon. Sorry about the messed up message. On Tue, May 29, 2018 at 10:56 AM, Giampaolo Rodola' <g.rodola@gmail.com> wrote:
-- Giampaolo - http://grodola.blogspot.com

I have been doing speed testing on our network frequently enough over the last five year that I have a good feel for these changes. I have been benchmarking various means for copying files of 1.0-500 MB between workstations on our 1 Gbps LAN to local servers, and on our WAN which was just upgraded two months ago from a 7x T1 (11.5 Mbps) to 100 Mbps fiber. This is an all Windows network, workstations are Win7/10 and servers are Win Server 2003 (!!!), 2008, 2012 and 2016. The difference between open(dst).write(open(src).read()) and CopyFileW can be pretty signficant, especially for older Win server versions. I see maybe 4x speedup using CopyFileW to those machines with larger files, so it's a big win. On newer Windows Server versions it become sort of a wash, with both ways working better than anything else (much faster than command line utilities like DOS "copy" and Cygwin "cp", or Windows file explorer drag and drop, and very much faster than a C-code write(read()) loop). Here's an example I just ran for a Win Server 2003 copy. 2018-05-29T07:45:09 PDT ---Using CopyFileW Copy 'xmas.mpg' to 'p:/build/' (28,872,704 bytes) 1/3 - 0.27 s ( 905.2 Mbps) 2/3 - 0.25 s ( 983.9 Mbps) 3/3 - 0.26 s ( 968.6 Mbps) Theoretical: 0.25 s at 1000.0 Mbps Best time : 0.25 s at 983.9 Mbps ---Using write(read()) Copy 'xmas.mpg' to 'p:/build/' (28,872,704 bytes) 1/3 - 1.47 s ( 169.9 Mbps) 2/3 - 1.21 s ( 205.0 Mbps) 3/3 - 1.22 s ( 203.4 Mbps) Theoretical: 0.25 s at 1000.0 Mbps Best time : 1.21 s at 205.0 Mbps On our WAN, which has a VPN endpoint 3000 miles from our office, routing back to a test server another 2000 miles inside the network (tracert shows 12-15 hops, 200 ms latency, arrrg), copying is slow no matter what: we are lucky to see 40 Mbps on a connection that has the slowest link section at 100 Mbps. The biggest improvement we saw on WAN copies was when Ben Hoyt built scandir on top of Windows primitives. We were doing a copytree using os.walk that would take almost 20 (twenty) minutes just to build the tree before it copied the first file. When scandir was first released, I rewrote my code with fingers crossed that it would help a bit. On my first test, it took something like 15-20 seconds to build the tree and start copying. I was sure I had screwed up and broken it, but was quite shocked when I saw that it was in fact working correctly. Bottom line is +1 on switching to OS-specific copy mechanisms. Python is already the best* language for this sort of work (I've tried C/C++ and C# as alternatives), and this will make it even better. *Best for me in this particular case is "resulting code is fastest, ease of implementation is secondary." On Tue, May 29, 2018 at 1:59 AM Giampaolo Rodola' <g.rodola@gmail.com> wrote:

On Tuesday, 29 May 2018 16:42:13 BST Eric Fahlgren wrote: snip...
This is a guess on my part, I may be way off base. Maybe you are suffering from TCP windows scaling not work well enough? This page claims there are bugs in the windows implementations. Its also claimed, elsewhere, that some middle boxes mess up TCP windows scaling. http://web.archive.org/web/20120217135039/http://fasterdata.es.net:80/ fasterdata/host-tuning/ms-windows Wikipedia has a good description of TCP windows scalling. Barry

On Tue, May 29, 2018 at 10:17 AM Barry Scott <barry@barrys-emacs.org> wrote:
On Tuesday, 29 May 2018 16:42:13 BST Eric Fahlgren wrote:
Maybe you are suffering from TCP windows scaling not work well enough?
Thanks for the tip, I'll have to mention that to our IT infrastructure guys and see if they can check it out.

Whops, I hit "send" too soon. Sorry about the messed up message. On Tue, May 29, 2018 at 10:56 AM, Giampaolo Rodola' <g.rodola@gmail.com> wrote:
-- Giampaolo - http://grodola.blogspot.com

I have been doing speed testing on our network frequently enough over the last five year that I have a good feel for these changes. I have been benchmarking various means for copying files of 1.0-500 MB between workstations on our 1 Gbps LAN to local servers, and on our WAN which was just upgraded two months ago from a 7x T1 (11.5 Mbps) to 100 Mbps fiber. This is an all Windows network, workstations are Win7/10 and servers are Win Server 2003 (!!!), 2008, 2012 and 2016. The difference between open(dst).write(open(src).read()) and CopyFileW can be pretty signficant, especially for older Win server versions. I see maybe 4x speedup using CopyFileW to those machines with larger files, so it's a big win. On newer Windows Server versions it become sort of a wash, with both ways working better than anything else (much faster than command line utilities like DOS "copy" and Cygwin "cp", or Windows file explorer drag and drop, and very much faster than a C-code write(read()) loop). Here's an example I just ran for a Win Server 2003 copy. 2018-05-29T07:45:09 PDT ---Using CopyFileW Copy 'xmas.mpg' to 'p:/build/' (28,872,704 bytes) 1/3 - 0.27 s ( 905.2 Mbps) 2/3 - 0.25 s ( 983.9 Mbps) 3/3 - 0.26 s ( 968.6 Mbps) Theoretical: 0.25 s at 1000.0 Mbps Best time : 0.25 s at 983.9 Mbps ---Using write(read()) Copy 'xmas.mpg' to 'p:/build/' (28,872,704 bytes) 1/3 - 1.47 s ( 169.9 Mbps) 2/3 - 1.21 s ( 205.0 Mbps) 3/3 - 1.22 s ( 203.4 Mbps) Theoretical: 0.25 s at 1000.0 Mbps Best time : 1.21 s at 205.0 Mbps On our WAN, which has a VPN endpoint 3000 miles from our office, routing back to a test server another 2000 miles inside the network (tracert shows 12-15 hops, 200 ms latency, arrrg), copying is slow no matter what: we are lucky to see 40 Mbps on a connection that has the slowest link section at 100 Mbps. The biggest improvement we saw on WAN copies was when Ben Hoyt built scandir on top of Windows primitives. We were doing a copytree using os.walk that would take almost 20 (twenty) minutes just to build the tree before it copied the first file. When scandir was first released, I rewrote my code with fingers crossed that it would help a bit. On my first test, it took something like 15-20 seconds to build the tree and start copying. I was sure I had screwed up and broken it, but was quite shocked when I saw that it was in fact working correctly. Bottom line is +1 on switching to OS-specific copy mechanisms. Python is already the best* language for this sort of work (I've tried C/C++ and C# as alternatives), and this will make it even better. *Best for me in this particular case is "resulting code is fastest, ease of implementation is secondary." On Tue, May 29, 2018 at 1:59 AM Giampaolo Rodola' <g.rodola@gmail.com> wrote:

On Tuesday, 29 May 2018 16:42:13 BST Eric Fahlgren wrote: snip...
This is a guess on my part, I may be way off base. Maybe you are suffering from TCP windows scaling not work well enough? This page claims there are bugs in the windows implementations. Its also claimed, elsewhere, that some middle boxes mess up TCP windows scaling. http://web.archive.org/web/20120217135039/http://fasterdata.es.net:80/ fasterdata/host-tuning/ms-windows Wikipedia has a good description of TCP windows scalling. Barry

On Tue, May 29, 2018 at 10:17 AM Barry Scott <barry@barrys-emacs.org> wrote:
On Tuesday, 29 May 2018 16:42:13 BST Eric Fahlgren wrote:
Maybe you are suffering from TCP windows scaling not work well enough?
Thanks for the tip, I'll have to mention that to our IT infrastructure guys and see if they can check it out.
participants (3)
-
Barry Scott
-
Eric Fahlgren
-
Giampaolo Rodola'