Sporadic failures of test_subprocess and test_multiprocessing_spawn
Hi, Can you please take a look at the following issue and try to reproduce it? http://bugs.python.org/issue23771 The following tests sometimes hang on "x86 Ubuntu Shared 3.x" and "AMD64 Debian root 3.x" buildbots: - test_notify_all() of test_multiprocessing_spawn - test_double_close_on_error() of test_subprocess - other sporadic failures of test_subprocess I'm quite sure that they are regressions, maybe related to the implementation of the PEP 475. In the middle of all PEP 475 changes, I changed some functions to release the GIL on I/O, it wasn't the case before. I may be related. Are you able to reproduce these issues? I'm unable to reproduce them on Fedora 21. Maybe they are more likely on Debian-like operating systems? Victor
On Sat, Mar 28, 2015 at 8:39 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Are you able to reproduce these issues? I'm unable to reproduce them on Fedora 21. Maybe they are more likely on Debian-like operating systems?
I just tried it on my Debian Wheezy AMD64, not running as root. (The same computer the buildbot runs on, but the host OS rather than the virtual machine.) Unable to reproduce. Ran a full 'make test' and still no problems. Am now trying it on the buildbot itself, to see if I can recreate it. ChrisA
On Sat, Mar 28, 2015 at 9:10 PM, Chris Angelico <rosuav@gmail.com> wrote:
On Sat, Mar 28, 2015 at 8:39 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Are you able to reproduce these issues? I'm unable to reproduce them on Fedora 21. Maybe they are more likely on Debian-like operating systems?
I just tried it on my Debian Wheezy AMD64, not running as root. (The same computer the buildbot runs on, but the host OS rather than the virtual machine.) Unable to reproduce. Ran a full 'make test' and still no problems.
Am now trying it on the buildbot itself, to see if I can recreate it.
It seems to be stalling out. I'm not sure exactly what's happening here. Running just that one file doesn't do it, but running the full test suite results in a stall. ChrisA
Good, you are able to reproduce it. The next step is to identify the sequence of test to reproduce it. How do you run the test suite? Are you using -j1? Victor Le samedi 28 mars 2015, Chris Angelico <rosuav@gmail.com> a écrit :
On Sat, Mar 28, 2015 at 9:10 PM, Chris Angelico <rosuav@gmail.com <javascript:;>> wrote:
On Sat, Mar 28, 2015 at 8:39 PM, Victor Stinner <victor.stinner@gmail.com <javascript:;>> wrote:
Are you able to reproduce these issues? I'm unable to reproduce them on Fedora 21. Maybe they are more likely on Debian-like operating systems?
I just tried it on my Debian Wheezy AMD64, not running as root. (The same computer the buildbot runs on, but the host OS rather than the virtual machine.) Unable to reproduce. Ran a full 'make test' and still no problems.
Am now trying it on the buildbot itself, to see if I can recreate it.
It seems to be stalling out. I'm not sure exactly what's happening here. Running just that one file doesn't do it, but running the full test suite results in a stall.
ChrisA _______________________________________________ Python-Dev mailing list Python-Dev@python.org <javascript:;> https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.co...
On Sat, Mar 28, 2015 at 11:50 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Good, you are able to reproduce it. The next step is to identify the sequence of test to reproduce it. How do you run the test suite? Are you using -j1?
I just ran 'make test'. Early in the output are these lines: ./python ./Tools/scripts/run_tests.py /root/build/python -W default -bb -E -R -W error::BytesWarning -m test -r -w -j 0 -u all,-largefile,-audio,-gui (The build was done in /root/build, fwiw) So it seems to be -j0. ChrisA
Hi, 2015-03-28 12:26 GMT+01:00 Chris Angelico <rosuav@gmail.com>:
It seems to be stalling out. I'm not sure exactly what's happening here. Running just that one file doesn't do it, but running the full test suite results in a stall.
Ok, I reproduced the issue on David's buildbot. A (child) process was stuck in _Py_open(), function called from _close_open_fds_safe() which is called to run a child process. Calling _Py_open() is not safe here because the GIL may or may not be held. After fork, the status of the GIL is unclear. I fixed the issue by replacing _Py_open() with _Py_open_noraise(), this version doesn't use the GIL (it doesn't release the GIL to call open() and it doesn't raise an exception). https://hg.python.org/cpython/rev/2e1234208bab Thanks David for the buildbot. Victor
On 28.03.15 11:39, Victor Stinner wrote:
Can you please take a look at the following issue and try to reproduce it? http://bugs.python.org/issue23771
The following tests sometimes hang on "x86 Ubuntu Shared 3.x" and "AMD64 Debian root 3.x" buildbots:
- test_notify_all() of test_multiprocessing_spawn - test_double_close_on_error() of test_subprocess - other sporadic failures of test_subprocess
I'm quite sure that they are regressions, maybe related to the implementation of the PEP 475. In the middle of all PEP 475 changes, I changed some functions to release the GIL on I/O, it wasn't the case before. I may be related.
Are you able to reproduce these issues? I'm unable to reproduce them on Fedora 21. Maybe they are more likely on Debian-like operating systems?
Just run tests with low memory limit. (ulimit -v 60000; ./python -m test.regrtest -uall -v test_multiprocessing_spawn;) test_io also hangs.
Hi Serhiy, 2015-03-28 17:40 GMT+01:00 Serhiy Storchaka <storchaka@gmail.com>:
Just run tests with low memory limit.
(ulimit -v 60000; ./python -m test.regrtest -uall -v test_multiprocessing_spawn;)
test_io also hangs.
I confirm that some tests using threads hang under very low memory condition. At bootstrap, Python doesn't handle correctly all exceptions when starting a new thread. The "parent thread" waits until the "child thread" completes, which never occurs because the thread already completed. I don't think that it was my regression, it probably exists since Python 2 and maybe before. I'm not interested to touch this fragile part of Python. It's maybe easy to fix the issue, I don't know. Victor
participants (3)
-
Chris Angelico
-
Serhiy Storchaka
-
Victor Stinner