[Python-ideas] PEP 554: Stdlib Module to Support Multiple Interpreters in Python Code

Nathaniel Smith njs at pobox.com
Fri Sep 8 00:08:48 EDT 2017


On Thu, Sep 7, 2017 at 6:14 PM, Matthew Rocklin <mrocklin at gmail.com> wrote:
> Those numbers were for common use in Python tools and reflected my anecdotal
> experience at the time with normal Python tools.  I'm sure that there are
> mechanisms to achieve faster speeds than what I experienced.  That being
> said, here is a small example.
>
>
> In [1]: import multiprocessing
> In [2]: data = b'0' * 100000000  # 100 MB
> In [3]: from toolz import identity
> In [4]: pool = multiprocessing.Pool()
> In [5]: %time _ = pool.apply_async(identity, (data,)).get()
> CPU times: user 76 ms, sys: 64 ms, total: 140 ms
> Wall time: 252 ms
>
> This is about 400MB/s for a roundtrip

Awesome, thanks for bringing numbers into my wooly-headed theorizing :-).

On my laptop I actually get a worse result from your benchmark: 531 ms
for 100 MB == ~200 MB/s round-trip, or 400 MB/s one-way. So yeah,
transferring data between processes with multiprocessing is slow.

This is odd, though, because on the same machine, using socat to send
1 GiB between processes using a unix domain socket runs at 2 GB/s:

# terminal 1
~$ rm -f /tmp/unix.sock && socat -u -b32768 UNIX-LISTEN:/tmp/unix.sock
"SYSTEM:pv -W > /dev/null"
1.00GiB 0:00:00 [1.89GiB/s] [<=>                                               ]

# terminal 2
~$ socat -u -b32768 "SYSTEM:dd if=/dev/zero bs=1M count=1024"
UNIX:/tmp/unix.sock
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.529814 s, 2.0 GB/s

(Notice that the pv output is in GiB/s and the dd output is in GB/s.
1.89 GiB/s = 2.03 GB/s, so they actually agree.)

On my system, Python allocates + copies memory at 2.2 GB/s, so bulk
byte-level IPC is within 10% of within-process bulk copying:

# same 100 MB bytestring as above
In [7]: bytearray_data = bytearray(data)

In [8]: %timeit bytearray_data.copy()
45.3 ms ± 540 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: 0.100 / 0.0453  # GB / seconds
Out[9]: 2.207505518763797

I don't know why multiprocessing is so slow -- maybe there's a good
reason, maybe not. But the reason isn't that IPC is intrinsically
slow, and subinterpreters aren't going to automatically be 5x faster
because they can use memcpy.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the Python-ideas mailing list