[New-bugs-announce] [issue41586] Allow to set pipe size on subprocess.Popen.
report at bugs.python.org
Wed Aug 19 01:46:00 EDT 2020
New submission from Ruben Vorderman <r.h.p.vorderman at lumc.nl>:
Pipes block if reading from an empty pipe or when writing to a full pipe. When this happens the program waiting for the pipe still uses a lot of CPU cycles when waiting for the pipe to stop blocking.
I found this while working with xopen. A library that pipes data into an external gzip process. (This is more efficient than using python's gzip module, because the subprocess escapes the GIL, so your main algorithm can fully utilize one CPU core while the compression is offloaded to another).
It turns out that increasing the pipe size on Linux from the default of 64KB to the maximum allowed pipe size in /proc/sys/fs/max-pipe-size (1024KB) drastically improves performance: https://github.com/marcelm/xopen/issues/35. TLDR: full utilization of CPU cores, a 40%+ decrease in wall-clock time and a 20% decrease in the number of compute seconds (indicating that 20% was wasted waiting on blocking pipes).
However, doing this with subprocess is quite involved as it is now.
1. You have to find out which constants to use in fcntl for setting the pipesize (these constants are not in python).
2. You have to start the Popen process with routing stdout to subprocess.Pipe.
3. You have to get my_popen_process.stdout.fileno()
4. Use fcntl.fcntl to modify the pipe size.
It would be much easier to do `subprocess.Popen(args, pipesize=1024 *1024)` for example.
I am currently working on a PR implementing this. It will also make F_GETPIPE_SZ and F_SETPIPE_SZ available to the fcntl module.
components: Library (Lib)
title: Allow to set pipe size on subprocess.Popen.
versions: Python 3.10
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce