[issue20540] Python 3.3/3.4 regression in multiprocessing manager ?

Irvin Probst report at bugs.python.org
Fri Feb 7 13:06:28 CET 2014


New submission from Irvin Probst:

After checking with the kind people of help at python.org I think I found a bug in the way connections are handled in the multiprocessing module.

Using python 3.3 I've noticed a performance drop of about 70 times when running some code performing basic requests on a SyncManager. As this code was burried deep into a big project I made a test case attached to this report to reproduce this behavior.

Here is what this code does:
- define a class SharedData with several instance variables (a,b and c here)
- this class has two methods exposed through a proxy (good() ans bad()) both see a huge performance drop using 3.3 and can be used to reproduce this behavior. The only difference is that good() uses a mutex whereas bad() does not, I wished to check that mutexes were not to blame for this problem.
- create a Syncmanager giving access to a SharedData instance
- launch a multiprocessing.Process() running the do_stuff() function, this function calls 10 times the good() (or bad()) method of SharedData through the Syncmanager, passes some values to it and gets back the result.
- after each call to the proxy the time elapsed, roughly measured with time.time(), is printed

System specs:
Linux turing 3.12-1-686-pae #1 SMP Debian 3.12.6-2 (2013-12-29) i686 GNU/Linux

Python version:
latests 2.6,2.7,3.2 and 3.3 from standard debian repos
3.3.0 and 3.4.0 beta3 compiled from source

time elapsed in each call to the proxy using Python 2.6, 2.7 and 3.2:
first call to proxy ~ 0.04 seconds, next calls ~0.001 sec

time elapsed in each call to the proxy using Python 3.3.0, 3.3.2, 3.3.3 , 3.4.0b3:
first call to proxy ~0.27 seconds, next calls: 0.07 sec

I reproduced this behavior using python 2.7 and 3.3.3 on an ubuntu computer running the latest amd64 release.

Of course I tried without a print() after each call but it does not change anything, python 3.3 is *much* slower here.

Using cProfile I tracked this down to the builtin read() method and indeed timing the read() syscall in posix_read() using gettimeofday() confirms it takes ages, posix_write() is fine though. I think the problem comes from the underlying socket the interpreter is reading from.

connections.py from multiprocessing has been rewrittend between 3.2 and 3.3 but I can't see what's wrong in the way it has been done, basic socket options seem to be exactly the same.

One interesting point is that it seems to only affect the last bytes sent through the socket, e.g if you send a numpy.array big enough to fill the socket's read() buffer the first calls to read() are done at a normal speed, only the last one takes time.

If you confirm my test code is not to blame it makes IMHO the SyncManager in 3.3 and 3.4 completely unusable for frequent data exchanges between processes.

Thanks for your time.

----------
components: Library (Lib)
files: test_manager.py
messages: 210457
nosy: Irvin.Probst
priority: normal
severity: normal
status: open
title: Python 3.3/3.4 regression in multiprocessing manager ?
type: performance
versions: Python 3.3, Python 3.4
Added file: http://bugs.python.org/file33956/test_manager.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20540>
_______________________________________


More information about the Python-bugs-list mailing list