Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

Sept. 8, 2017

      On Thu, Sep 7, 2017 at 6:14 PM, Matthew Rocklin <mrocklin@gmail.com> wrote:
...
Those numbers were for common use in Python tools and reflected my anecdotal
experience at the time with normal Python tools.  I'm sure that there are
mechanisms to achieve faster speeds than what I experienced.  That being
said, here is a small example.
In [1]: import multiprocessing
In [2]: data = b'0' * 100000000  # 100 MB
In [3]: from toolz import identity
In [4]: pool = multiprocessing.Pool()
In [5]: %time _ = pool.apply_async(identity, (data,)).get()
CPU times: user 76 ms, sys: 64 ms, total: 140 ms
Wall time: 252 ms
This is about 400MB/s for a roundtrip
Awesome, thanks for bringing numbers into my wooly-headed theorizing :-).

On my laptop I actually get a worse result from your benchmark: 531 ms
for 100 MB == ~200 MB/s round-trip, or 400 MB/s one-way. So yeah,
transferring data between processes with multiprocessing is slow.

This is odd, though, because on the same machine, using socat to send
1 GiB between processes using a unix domain socket runs at 2 GB/s:

# terminal 1
~$ rm -f /tmp/unix.sock && socat -u -b32768 UNIX-LISTEN:/tmp/unix.sock
"SYSTEM:pv -W > /dev/null"
1.00GiB 0:00:00 [1.89GiB/s] [<=>                                               ]

# terminal 2
~$ socat -u -b32768 "SYSTEM:dd if=/dev/zero bs=1M count=1024"
UNIX:/tmp/unix.sock
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.529814 s, 2.0 GB/s

(Notice that the pv output is in GiB/s and the dd output is in GB/s.
1.89 GiB/s = 2.03 GB/s, so they actually agree.)

On my system, Python allocates + copies memory at 2.2 GB/s, so bulk
byte-level IPC is within 10% of within-process bulk copying:

# same 100 MB bytestring as above
In [7]: bytearray_data = bytearray(data)

In [8]: %timeit bytearray_data.copy()
45.3 ms ± 540 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: 0.100 / 0.0453  # GB / seconds
Out[9]: 2.207505518763797

I don't know why multiprocessing is so slow -- maybe there's a good
reason, maybe not. But the reason isn't that IPC is intrinsically
slow, and subinterpreters aren't going to automatically be 5x faster
because they can use memcpy.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org

Re: [Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

Nathaniel Smith