Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1
well, ok, but that does not sound like a pypy bug then - "close all existing fds and complain that some of them are closed" is a bit not good - maybe it's a bug in python-daemon and the PEP? On Wed, Dec 23, 2015 at 3:36 PM, hubo <hubo@jiedaibao.com> wrote:
No, the python-daemon module is critical in this problem, because it is the python-daemon module who closed the fd to /dev/urandom. When process swith to daemon, it forks itself, and then close all open fds (including stdin, stdout and stderr), so it also closes the fd for /dev/urandom which is used by PyPy library. It is the standard behavior defined by https://www.python.org/dev/peps/pep-3143/#daemoncontext-objects and also the standard behavior for unix daemons. And unfortunately there is not a way to prevent the fd to be closed without knowing exactly what number it is on.
Without python-daemon (or similar libraries), it is only possible to reproduce the problem by closing the fd (usually 4) forcely, but it does not make much sense.
2015-12-23 ------------------------------ hubo ------------------------------
*发件人:*Maciej Fijalkowski <fijall@gmail.com> *发送时间:*2015-12-23 21:22 *主题:*Re: Re: Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1 *收件人:*"hubo"<hubo@jiedaibao.com> *抄送:*"pypy-dev"<pypy-dev@python.org>
can you reproduce the OSError problem without having the daemon module involved either?
On Wed, Dec 23, 2015 at 3:14 PM, hubo <hubo@jiedaibao.com> wrote:
I can only reproduce the *OSError* problem. Maybe the CPU 100% is not really a dead lock, but rather some kind of automatic crash report? Although it is quite easy to crash the program with os.urandom, it only stops responding when the crash happens in system libraries like multiprocessing or email.
The posix.urandom problem is quite easy to reproduce:
#!/usr/bin/pypy import os os.urandom(16) def test(): print repr(os.urandom(16)) import daemon import sys if __name__ == '__main__': with daemon.DaemonContext(initgroups=False, stderr=sys.stderr,stdout=sys.stdout): test()
(stderr and stdout is kept open to show console messages in the daemon. initgroups=False is a workaround on python-daemon not working in Python2.6)
Or, with module random:
#!/usr/bin/pypy import random def test(): random.Random() import daemon import sys if __name__ == '__main__': with daemon.DaemonContext(initgroups=False, stderr=sys.stderr,stdout=sys.stdout): test() And when run scripts with pypy:
pypy test3.py
it crashes with OSError: Traceback (most recent call last): File "test2.py", line 13, in <module> test() File "test2.py", line 6, in test random.Random() File "/opt/pypy-4.0.1-linux_x86_64-portable/lib-python/2.7/random.py", line 95, in __init__ self.seed(x) File "/opt/pypy-4.0.1-linux_x86_64-portable/lib-python/2.7/random.py", line 111, in seed a = long(_hexlify(_urandom(2500)), 16) OSError: [Errno 9] Bad file descriptor
It is still not clear why it causes dead loop (or long-time no responding) in multiprocessing (should have thrown an ImportError) and the exact condition for the file descriptor of /dev/urandom appears (just call os.urandom and import random does not reproduce the result), but I believe it is definitely linked to the problem.
2015-12-23 ------------------------------ hubo ------------------------------
*发件人:*Maciej Fijalkowski <fijall@gmail.com> *发送时间:*2015-12-23 20:07 *主题:*Re: Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1 *收件人:*"hubo"<hubo@jiedaibao.com> *抄送:*"pypy-dev"<pypy-dev@python.org>
That's very interesting, can you produce a standalone example that does not use multiprocessing? That would make it much easier to fix the bug (e.g. os.fork followed by os.urandom failing)
On Wed, Dec 23, 2015 at 1:54 PM, hubo <hubo@jiedaibao.com> wrote:
Thanks for the response. Should I put it directly in the bug tracker?
FYI, I've located the reason to be the incompatibility with python-daemon (or rather the standard unix-daemon behavior) and PyPy *posix.urandom* implementation.
It seems that in PyPy 4.0.1, when module *random* loaded, a file descriptor is created on /dev/urandom. I think PyPy implementation use the shared descriptor to read from /dev/urandom. Sadly when python-daemon fork the process and turns it into an unix daemon, it closes all the currently open file descriptors. After that all os.urandom calls failed with OSError. I think maybe the other functions of Random class is also using the file descriptor in C code and just never detects if the return value is 0, and causes the dead loop.
I think the problem will be solved if the implementation re-open the handle when it is closed somehow.
multiprocessing is using random internally. Also there are lots of other modules using random, like email etc. The dead loop occurs when you use any of the libraries in a daemon.
2015-12-23 ------------------------------ hubo ------------------------------
*发件人:*Maciej Fijalkowski <fijall@gmail.com> *发送时间:*2015-12-23 19:35 *主题:*Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1 *收件人:*"hubo"<hubo@jiedaibao.com> *抄送:*"pypy-dev"<pypy-dev@python.org>
Hi hubo
Can you put it as a bug report? Those things get easily lost on the mailing list (and sadly I won't look at it right now, multiprocessing scares me)
On Wed, Dec 23, 2015 at 12:03 PM, hubo <hubo@jiedaibao.com> wrote:
Hello devs,
A (possible) dead loop is found when I use python-daemon and multiprocessing together in PyPy 4.0.1, which does not appear in Python(2.6 or 2.7). Also it does not appear in earlier PyPy versions (2.0.2)
*Reproduce*:
First install python-daemon: pypy_pip install python-daemon
Use the following test script (also available in attachment):
#!/usr/bin/pypy import daemon import multiprocessing def test(): q = multiprocessing.Queue(64) if __name__ == '__main__': with daemon.DaemonContext(): test()
When executing the script with pypy: pypy test.py
The background service does not exit, and is consuming 100% CPU: ps aux | grep pypy root 7769 99.1 0.5 235332 46812 ? R 17:52 2:09 pypy test.py root 7775 0.0 0.0 103252 804 pts/1 S+ 17:54 0:00 grep pypy
Executing the script with python: python2.7 test.py And the background service normally exits.
*Environment:* I'm using CentOS 6.5, with portable PyPy distribution for linux ( https://bitbucket.org/squeaky/portable-pypy/downloads/pypy-4.0.1-linux_x86_6... ) I run the script on system built-in python (python 2.6.6), a compiled CPython (2.7.11), and pypy from epel-release(pypy 2.0.2, python 2.7.2), and the problem does not appear. Though the compiled CPython is 2.7.11 and PyPy 4.0.4 is python 2.7.10, I think that does not matter much.
Please contact if you have any questions or ideas.
2015-12-23 ------------------------------ hubo
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Hi Maciek, Of course it's a difference between pypy and cpython. if you close all fds, cpython's random module still works, but PyPy's doesn't. Cheers, Carl Friedrich On December 23, 2015 2:54:23 PM GMT+01:00, Maciej Fijalkowski <fijall@gmail.com> wrote:
well, ok, but that does not sound like a pypy bug then - "close all existing fds and complain that some of them are closed" is a bit not good - maybe it's a bug in python-daemon and the PEP?
On Wed, Dec 23, 2015 at 3:36 PM, hubo <hubo@jiedaibao.com> wrote:
No, the python-daemon module is critical in this problem, because it is the python-daemon module who closed the fd to /dev/urandom. When process swith to daemon, it forks itself, and then close all open fds (including stdin, stdout and stderr), so it also closes the fd for /dev/urandom which is used by PyPy library. It is the standard behavior defined by https://www.python.org/dev/peps/pep-3143/#daemoncontext-objects and also the standard behavior for unix daemons. And unfortunately there is not a way to prevent the fd to be closed without knowing exactly what number it is on.
Without python-daemon (or similar libraries), it is only possible to reproduce the problem by closing the fd (usually 4) forcely, but it does not make much sense.
2015-12-23 ------------------------------ hubo ------------------------------
*发件人:*Maciej Fijalkowski <fijall@gmail.com> *发送时间:*2015-12-23 21:22 *主题:*Re: Re: Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1 *收件人:*"hubo"<hubo@jiedaibao.com> *抄送:*"pypy-dev"<pypy-dev@python.org>
can you reproduce the OSError problem without having the daemon module involved either?
On Wed, Dec 23, 2015 at 3:14 PM, hubo <hubo@jiedaibao.com> wrote:
I can only reproduce the *OSError* problem. Maybe the CPU 100% is not really a dead lock, but rather some kind of automatic crash report? Although it is quite easy to crash the program with os.urandom, it only stops responding when the crash happens in system libraries like multiprocessing or email.
The posix.urandom problem is quite easy to reproduce:
#!/usr/bin/pypy import os os.urandom(16) def test(): print repr(os.urandom(16)) import daemon import sys if __name__ == '__main__': with daemon.DaemonContext(initgroups=False, stderr=sys.stderr,stdout=sys.stdout): test()
(stderr and stdout is kept open to show console messages in the daemon. initgroups=False is a workaround on python-daemon not working in Python2.6)
Or, with module random:
#!/usr/bin/pypy import random def test(): random.Random() import daemon import sys if __name__ == '__main__': with daemon.DaemonContext(initgroups=False, stderr=sys.stderr,stdout=sys.stdout): test() And when run scripts with pypy:
pypy test3.py
it crashes with OSError: Traceback (most recent call last): File "test2.py", line 13, in <module> test() File "test2.py", line 6, in test random.Random() File "/opt/pypy-4.0.1-linux_x86_64-portable/lib-python/2.7/random.py", line 95, in __init__ self.seed(x) File "/opt/pypy-4.0.1-linux_x86_64-portable/lib-python/2.7/random.py", line 111, in seed a = long(_hexlify(_urandom(2500)), 16) OSError: [Errno 9] Bad file descriptor
It is still not clear why it causes dead loop (or long-time no responding) in multiprocessing (should have thrown an ImportError) and the exact condition for the file descriptor of /dev/urandom appears (just call os.urandom and import random does not reproduce the result), but I believe it is definitely linked to the problem.
2015-12-23 ------------------------------ hubo ------------------------------
*发件人:*Maciej Fijalkowski <fijall@gmail.com> *发送时间:*2015-12-23 20:07 *主题:*Re: Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1 *收件人:*"hubo"<hubo@jiedaibao.com> *抄送:*"pypy-dev"<pypy-dev@python.org>
That's very interesting, can you produce a standalone example that does not use multiprocessing? That would make it much easier to fix the bug (e.g. os.fork followed by os.urandom failing)
On Wed, Dec 23, 2015 at 1:54 PM, hubo <hubo@jiedaibao.com> wrote:
Thanks for the response. Should I put it directly in the bug tracker?
FYI, I've located the reason to be the incompatibility with python-daemon (or rather the standard unix-daemon behavior) and PyPy *posix.urandom* implementation.
It seems that in PyPy 4.0.1, when module *random* loaded, a file descriptor is created on /dev/urandom. I think PyPy implementation use the shared descriptor to read from /dev/urandom. Sadly when python-daemon fork the process and turns it into an unix daemon, it closes all the currently open file descriptors. After that all os.urandom calls failed with OSError. I think maybe the other functions of Random class is also using the file descriptor in C code and just never detects if the return value is 0, and causes the dead loop.
I think the problem will be solved if the implementation re-open the handle when it is closed somehow.
multiprocessing is using random internally. Also there are lots of other modules using random, like email etc. The dead loop occurs when you use any of the libraries in a daemon.
2015-12-23 ------------------------------ hubo ------------------------------
*发件人:*Maciej Fijalkowski <fijall@gmail.com> *发送时间:*2015-12-23 19:35 *主题:*Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1 *收件人:*"hubo"<hubo@jiedaibao.com> *抄送:*"pypy-dev"<pypy-dev@python.org>
Hi hubo
Can you put it as a bug report? Those things get easily lost on the mailing list (and sadly I won't look at it right now, multiprocessing scares me)
On Wed, Dec 23, 2015 at 12:03 PM, hubo <hubo@jiedaibao.com> wrote:
Hello devs,
A (possible) dead loop is found when I use python-daemon and multiprocessing together in PyPy 4.0.1, which does not appear in Python(2.6 or 2.7). Also it does not appear in earlier PyPy versions (2.0.2)
*Reproduce*:
First install python-daemon: pypy_pip install python-daemon
Use the following test script (also available in attachment):
#!/usr/bin/pypy import daemon import multiprocessing def test(): q = multiprocessing.Queue(64) if __name__ == '__main__': with daemon.DaemonContext(): test()
When executing the script with pypy: pypy test.py
The background service does not exit, and is consuming 100% CPU: ps aux | grep pypy root 7769 99.1 0.5 235332 46812 ? R 17:52 2:09 pypy test.py root 7775 0.0 0.0 103252 804 pts/1 S+ 17:54 0:00 grep pypy
Executing the script with python: python2.7 test.py And the background service normally exits.
*Environment:* I'm using CentOS 6.5, with portable PyPy distribution for linux (
https://bitbucket.org/squeaky/portable-pypy/downloads/pypy-4.0.1-linux_x86_6...
) I run the script on system built-in python (python 2.6.6), a compiled CPython (2.7.11), and pypy from epel-release(pypy 2.0.2, python 2.7.2), and the problem does not appear. Though the compiled CPython is 2.7.11 and PyPy 4.0.4 is python 2.7.10, I think that does not matter much.
Please contact if you have any questions or ideas.
2015-12-23 ------------------------------ hubo
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
------------------------------------------------------------------------
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
I bet the difference is due to "we see more FDs in pypy". If you close the correct FD in CPython, it would break too I presume? Or is there special code to handle that? On Wed, Dec 23, 2015 at 3:57 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:
Hi Maciek,
Of course it's a difference between pypy and cpython. if you close all fds, cpython's random module still works, but PyPy's doesn't.
Cheers,
Carl Friedrich
On December 23, 2015 2:54:23 PM GMT+01:00, Maciej Fijalkowski < fijall@gmail.com> wrote:
well, ok, but that does not sound like a pypy bug then - "close all existing fds and complain that some of them are closed" is a bit not good - maybe it's a bug in python-daemon and the PEP?
On Wed, Dec 23, 2015 at 3:36 PM, hubo <hubo@jiedaibao.com> wrote:
No, the python-daemon module is critical in this problem, because it is the python-daemon module who closed the fd to /dev/urandom. When process swith to daemon, it forks itself, and then close all open fds (including stdin, stdout and stderr), so it also closes the fd for /dev/urandom which is used by PyPy library. It is the standard behavior defined by https://www.python.org/dev/peps/pep-3143/#daemoncontext-objects and also the standard behavior for unix daemons. And unfortunately there is not a way to prevent the fd to be closed without knowing exactly what number it is on.
Without python-daemon (or similar libraries), it is only possible to reproduce the problem by closing the fd (usually 4) forcely, but it does not make much sense.
2015-12-23 ------------------------------ hubo ------------------------------
*发件人:*Maciej Fijalkowski <fijall@gmail.com> *发送时间:*2015-12-23 21:22 *主题:*Re: Re: Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1 *收件人:*"hubo"<hubo@jiedaibao.com> *抄送:*"pypy-dev"<pypy-dev@python.org>
can you reproduce the OSError problem without having the daemon module involved either?
On Wed, Dec 23, 2015 at 3:14 PM, hubo <hubo@jiedaibao.com> wrote:
I can only reproduce the *OSError* problem. Maybe the CPU 100% is not really a dead lock, but rather some kind of automatic crash report? Although it is quite easy to crash the program with os.urandom, it only stops responding when the crash happens in system libraries like multiprocessing or email.
The posix.urandom problem is quite easy to reproduce:
#!/usr/bin/pypy import os os.urandom(16) def test(): print repr(os.urandom(16)) import daemon import sys if __name__ == '__main__': with daemon.DaemonContext(initgroups=False, stderr=sys.stderr,stdout=sys.stdout): test()
(stderr and stdout is kept open to show console messages in the daemon. initgroups=False is a workaround on python-daemon not working in Python2.6)
Or, with module random:
#!/usr/bin/pypy import random def test(): random.Random() import daemon import sys if __name__ == '__main__': with daemon.DaemonContext(initgroups=False, stderr=sys.stderr,stdout=sys.stdout): test() And when run scripts with pypy:
pypy test3.py
it crashes with OSError: Traceback (most recent call last): File "test2.py", line 13, in <module> test() File "test2.py", line 6, in test random.Random() File "/opt/pypy-4.0.1-linux_x86_64-portable/lib-python/2.7/random.py", line 95, in __init__ self.seed(x) File "/opt/pypy-4.0.1-linux_x86_64-portable/lib-python/2.7/random.py", line 111, in seed a = long(_hexlify(_urandom(2500)), 16) OSError: [Errno 9] Bad file descriptor
It is still not clear why it causes dead loop (or long-time no responding) in multiprocessing (should have thrown an ImportError) and the exact condition for the file descriptor of /dev/urandom appears (just call os.urandom and import random does not reproduce the result), but I believe it is definitely linked to the problem.
2015-12-23 ------------------------------ hubo ------------------------------
*发件人:*Maciej Fijalkowski <fijall@gmail.com> *发送时间:*2015-12-23 20:07 *主题:*Re: Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1 *收件人:*"hubo"<hubo@jiedaibao.com> *抄送:*"pypy-dev"<pypy-dev@python.org>
That's very interesting, can you produce a standalone example that does not use multiprocessing? That would make it much easier to fix the bug (e.g. os.fork followed by os.urandom failing)
On Wed, Dec 23, 2015 at 1:54 PM, hubo <hubo@jiedaibao.com> wrote:
Thanks for the response. Should I put it directly in the bug tracker?
FYI, I've located the reason to be the incompatibility with python-daemon (or rather the standard unix-daemon behavior) and PyPy *posix.urandom* implementation.
It seems that in PyPy 4.0.1, when module *random* loaded, a file descriptor is created on /dev/urandom. I think PyPy implementation use the shared descriptor to read from /dev/urandom. Sadly when python-daemon fork the process and turns it into an unix daemon, it closes all the currently open file descriptors. After that all os.urandom calls failed with OSError. I think maybe the other functions of Random class is also using the file descriptor in C code and just never detects if the return value is 0, and causes the dead loop.
I think the problem will be solved if the implementation re-open the handle when it is closed somehow.
multiprocessing is using random internally. Also there are lots of other modules using random, like email etc. The dead loop occurs when you use any of the libraries in a daemon.
2015-12-23 ------------------------------ hubo ------------------------------
*发件人:*Maciej Fijalkowski <fijall@gmail.com> *发送时间:*2015-12-23 19:35 *主题:*Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1 *收件人:*"hubo"<hubo@jiedaibao.com> *抄送:*"pypy-dev"<pypy-dev@python.org>
Hi hubo
Can you put it as a bug report? Those things get easily lost on the mailing list (and sadly I won't look at it right now, multiprocessing scares me)
On Wed, Dec 23, 2015 at 12:03 PM, hubo <hubo@jiedaibao.com> wrote:
Hello devs,
A (possible) dead loop is found when I use python-daemon and multiprocessing together in PyPy 4.0.1, which does not appear in Python(2.6 or 2.7). Also it does not appear in earlier PyPy versions (2.0.2)
*Reproduce*:
First install python-daemon: pypy_pip install python-daemon
Use the following test script (also available in attachment):
#!/usr/bin/pypy import daemon import multiprocessing def test(): q = multiprocessing.Queue(64) if __name__ == '__main__': with daemon.DaemonContext(): test()
When executing the script with pypy: pypy test.py
The background service does not exit, and is consuming 100% CPU: ps aux | grep pypy root 7769 99.1 0.5 235332 46812 ? R 17:52 2:09 pypy test.py root 7775 0.0 0.0 103252 804 pts/1 S+ 17:54 0:00 grep pypy
Executing the script with python: python2.7 test.py And the background service normally exits.
*Environment:* I'm using CentOS 6.5, with portable PyPy distribution for linux ( https://bitbucket.org/squeaky/portable-pypy/downloads/pypy-4.0.1-linux_x86_6... ) I run the script on system built-in python (python 2.6.6), a compiled CPython (2.7.11), and pypy from epel-release(pypy 2.0.2, python 2.7.2), and the problem does not appear. Though the compiled CPython is 2.7.11 and PyPy 4.0.4 is python 2.7.10, I think that does not matter much.
Please contact if you have any questions or ideas.
2015-12-23 ------------------------------ hubo
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
------------------------------
pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Hi Maciej, On Wed, Dec 23, 2015 at 3:48 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
I bet the difference is due to "we see more FDs in pypy". If you close the correct FD in CPython, it would break too I presume? Or is there special code to handle that?
There is special code to handle that in CPython. It is actually very carefully checking that the FD is still open *and* still pointing to what it used to. See Python/random.c. Just thinking aloud: what could occur here is that the forked process reopens an unrelated file at this descriptor, and then our os.urandom() tries to read from it---and, as a guess, the file happens to be a socket opened in non-blocking mode, and our implementation of os.urandom() gets 0 bytes and keeps trying to get more, which throws it into an infinite loop. It seems it would be a good idea to copy the careful behavior of CPython. (There used to be another case of file descriptor kept open internally, in rpython/rtyper/lltypesystem/llarena.py for /dev/zero, but this is gone.) A bientôt, Armin.
Do you thought about that removed code ? https://bitbucket.org/vincentlegoll/pypy/commits/992e29624c5fd8a7e76677acf08... The filedesc was not kept opened there, or do you meant another thing in this file ? On Wed, Dec 23, 2015 at 6:24 PM, Armin Rigo <arigo@tunes.org> wrote:
Hi Maciej,
On Wed, Dec 23, 2015 at 3:48 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
I bet the difference is due to "we see more FDs in pypy". If you close the correct FD in CPython, it would break too I presume? Or is there special code to handle that?
There is special code to handle that in CPython. It is actually very carefully checking that the FD is still open *and* still pointing to what it used to. See Python/random.c.
Just thinking aloud: what could occur here is that the forked process reopens an unrelated file at this descriptor, and then our os.urandom() tries to read from it---and, as a guess, the file happens to be a socket opened in non-blocking mode, and our implementation of os.urandom() gets 0 bytes and keeps trying to get more, which throws it into an infinite loop.
It seems it would be a good idea to copy the careful behavior of CPython. (There used to be another case of file descriptor kept open internally, in rpython/rtyper/lltypesystem/llarena.py for /dev/zero, but this is gone.)
A bientôt,
Armin.
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
-- Vincent Legoll
Hi Vincent, On Wed, Dec 23, 2015 at 7:05 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
Do you thought about that removed code ? https://bitbucket.org/vincentlegoll/pypy/commits/992e29624c5fd8a7e76677acf08...
The filedesc was not kept opened there, or do you meant another thing in this file ?
Thanks for looking it up :-) Yes, that's exactly what I have in mind, and indeed I see now that the file descriptor was not kept opened, so it didn't have the problem anyway. However rpython.rlib.rurandom line 101 keeps a file descriptor opened and reuses it without checking, which is the problem here. A bientôt, Armin.
I may give this an eye if no one beats me to it, just not today, time to overeat! On Thu, Dec 24, 2015 at 7:02 AM, Armin Rigo <arigo@tunes.org> wrote:
Hi Vincent,
On Wed, Dec 23, 2015 at 7:05 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
Do you thought about that removed code ? https://bitbucket.org/vincentlegoll/pypy/commits/992e29624c5fd8a7e76677acf08...
The filedesc was not kept opened there, or do you meant another thing in this file ?
Thanks for looking it up :-) Yes, that's exactly what I have in mind, and indeed I see now that the file descriptor was not kept opened, so it didn't have the problem anyway. However rpython.rlib.rurandom line 101 keeps a file descriptor opened and reuses it without checking, which is the problem here.
A bientôt,
Armin.
-- Vincent Legoll
Hi Vincent, On Thu, Dec 24, 2015 at 3:34 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
I may give this an eye if no one beats me to it, just not today, time to overeat!
:-) In the meantime I reverted the older checkin that caches the file descriptor, with a comment, in c6be1b27fa1d. A bientôt, Armin.
:-) In the meantime I reverted the older checkin that caches the file descriptor, with a comment, in c6be1b27fa1d.
Are we really seeing a closed and reopened fd ? And if that's the case it may even be intentional ;-) because if we're only seeing a closed python file object, we could go the simpler way: - if not context[0]: + if (not context[0] or + type(context[0]) == file and context[0].closed): context[0] = os.open("/dev/urandom", os.O_RDONLY, 0777) I'll try to reproduce first then test if this simple fix is enough... Frohe Weihnachten ! -- Vincent Legoll
Hi Vincent, On Thu, Dec 24, 2015 at 5:50 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
Are we really seeing a closed and reopened fd ? And if that's the case it may even be intentional ;-)
because if we're only seeing a closed python file object, we could go the simpler way:
- if not context[0]: + if (not context[0] or + type(context[0]) == file and context[0].closed): context[0] = os.open("/dev/urandom", os.O_RDONLY, 0777)
I'll try to reproduce first then test if this simple fix is enough...
No, please don't commit "half fixes" like that. We know that there is a situation in which bad things occurs, so we need to deal with it. Fixing half the problem might make this particular case work, but not all of them. At least the current situation is not very good performance-wise but should work. A bientôt, Armin.
OK, I understand we want to have behavior as identical to cpython as possible. So does that mean implementing the same : "fstat() & check we're still on the same inode & device as before" ? See cpython's Python/random.c:208 This I have done, but I'll need help to make the test work (as in detect something is working after the patch that was not before)... Do we also need to do the "could have been opened in another thread" dance ? And what about the CLO_EXEC thing, this does seem sensible to do too, albeit not directly tied to the bug, even if it would make it happen in more cases (I think) On Thu, Dec 24, 2015 at 8:28 PM, Armin Rigo <arigo@tunes.org> wrote:
Hi Vincent,
On Thu, Dec 24, 2015 at 5:50 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
Are we really seeing a closed and reopened fd ? And if that's the case it may even be intentional ;-)
because if we're only seeing a closed python file object, we could go the simpler way:
- if not context[0]: + if (not context[0] or + type(context[0]) == file and context[0].closed): context[0] = os.open("/dev/urandom", os.O_RDONLY, 0777)
I'll try to reproduce first then test if this simple fix is enough...
No, please don't commit "half fixes" like that. We know that there is a situation in which bad things occurs, so we need to deal with it. Fixing half the problem might make this particular case work, but not all of them. At least the current situation is not very good performance-wise but should work.
A bientôt,
Armin.
-- Vincent Legoll
Hi Vincent, On Sat, Dec 26, 2015 at 1:45 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
OK, I understand we want to have behavior as identical to cpython as possible.
So does that mean implementing the same : "fstat() & check we're still on the same inode & device as before" ? Do we also need to do the "could have been opened in another thread" dance ? And what about the CLO_EXEC thing, this does seem sensible to do too,
This is what I started to do, but I stopped because there are additional issues on PyPy: you need to make sure that the GIL is not released at places where CPython does not release it either. Otherwise, you're opening the code to "race conditions" again. The goal would be to be *at least* as safe as CPython. There are a lot of corner cases that are not discussed in the source code of CPython. I'm pretty sure that some of them are possible (but rare). As an extreme example, if one thread does os.urandom() and another thread does os.close(4) in parallel, where 4 happens to be the file descriptor returned by open() in urandom.c, then it is possible that the open()'s result is closed after open() but before urandom.c reacquires the GIL. Then urandom.c gets a closed file descriptor. If additionally the other thread (or a 3rd one) opens a different file at file descriptor 4, then urandom.c will think it successfully opened /dev/urandom but actually the file descriptor is for some unrelated file. However, this is probably an issue that cannot be dealt with at all on Posix even in C. That would mean that it is always wrong to close() unknown file descriptors from one thread when other threads might be running... This would not hurt being clarified. A bientôt, Armin.
So to stay on the safe side, you prefer to keep reopening every time ? Fine, I've put my work here: https://bitbucket.org/vincentlegoll/pypy/commits/branch/fix-urandom-closed it contains a test for the "close /dev/urandom fd from under our feet" and a typo fix that you may still want to cherry pick... Tell me if there's something else I can do On Sun, Dec 27, 2015 at 12:42 AM, Armin Rigo <arigo@tunes.org> wrote:
Hi Vincent,
On Sat, Dec 26, 2015 at 1:45 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
OK, I understand we want to have behavior as identical to cpython as possible.
So does that mean implementing the same : "fstat() & check we're still on the same inode & device as before" ? Do we also need to do the "could have been opened in another thread" dance ? And what about the CLO_EXEC thing, this does seem sensible to do too,
This is what I started to do, but I stopped because there are additional issues on PyPy: you need to make sure that the GIL is not released at places where CPython does not release it either. Otherwise, you're opening the code to "race conditions" again.
The goal would be to be *at least* as safe as CPython. There are a lot of corner cases that are not discussed in the source code of CPython. I'm pretty sure that some of them are possible (but rare). As an extreme example, if one thread does os.urandom() and another thread does os.close(4) in parallel, where 4 happens to be the file descriptor returned by open() in urandom.c, then it is possible that the open()'s result is closed after open() but before urandom.c reacquires the GIL. Then urandom.c gets a closed file descriptor. If additionally the other thread (or a 3rd one) opens a different file at file descriptor 4, then urandom.c will think it successfully opened /dev/urandom but actually the file descriptor is for some unrelated file. However, this is probably an issue that cannot be dealt with at all on Posix even in C. That would mean that it is always wrong to close() unknown file descriptors from one thread when other threads might be running... This would not hurt being clarified.
A bientôt,
Armin.
-- Vincent Legoll
Hi Vincent, On Sun, Dec 27, 2015 at 11:45 AM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
So to stay on the safe side, you prefer to keep reopening every time ? https://bitbucket.org/vincentlegoll/pypy/commits/branch/fix-urandom-closed
I'm not against a proper fix :-) The problem is that for the fix to be proper, it needs a bit more work than what you did. The problem is that the calls to os.fstat() in your code can each release the GIL; in CPython, the corresponding calls to fstat() are done without the GIL released, and I believe it is important. A bientôt, Armin.
Hello and happy new pypyear to all pypyers ! On Tue, Jan 5, 2016 at 10:13 AM, Armin Rigo <arigo@tunes.org> wrote:
So to stay on the safe side, you prefer to keep reopening every time ? https://bitbucket.org/vincentlegoll/pypy/commits/branch/fix-urandom-closed
I'm not against a proper fix :-) The problem is that for the fix to be proper, it needs a bit more work than what you did. The problem is that the calls to os.fstat() in your code can each release the GIL; in CPython, the corresponding calls to fstat() are done without the GIL released, and I believe it is important.
OK for a proper fix, how can we write a test that will check for that ? Is there a doc for pypy's way to grab the GIL ? BTW, in the always open()ing, isn't the race still there, because the GIL is released between open() and read(), no ? So the fd can be closed here too... That could certainly be a smaller gap, but still... -- Vincent Legoll
I think the multi-threaded cases are not really problems. The race condition only happens when another thread closed the file descriptor, and the only knowing case of closing file descriptors without knowing what they do is during fork (daemon) process. It always produces unpredictable results if you fork with multiple threads without planning anyway. For GIL, I think it is acceptable for PyPy to be different with CPython in very rare cases. Users should understand they have different implement details. It is only important that common use cases like python-daemon works as expected. 2015-12-28 hubo 发件人:Armin Rigo <arigo@tunes.org> 发送时间:2015-12-27 07:42 主题:Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1 收件人:"Vincent Legoll"<vincent.legoll@gmail.com> 抄送:"pypy-dev"<pypy-dev@python.org> Hi Vincent, On Sat, Dec 26, 2015 at 1:45 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
OK, I understand we want to have behavior as identical to cpython as possible.
So does that mean implementing the same : "fstat() & check we're still on the same inode & device as before" ? Do we also need to do the "could have been opened in another thread" dance ? And what about the CLO_EXEC thing, this does seem sensible to do too,
This is what I started to do, but I stopped because there are additional issues on PyPy: you need to make sure that the GIL is not released at places where CPython does not release it either. Otherwise, you're opening the code to "race conditions" again. The goal would be to be *at least* as safe as CPython. There are a lot of corner cases that are not discussed in the source code of CPython. I'm pretty sure that some of them are possible (but rare). As an extreme example, if one thread does os.urandom() and another thread does os.close(4) in parallel, where 4 happens to be the file descriptor returned by open() in urandom.c, then it is possible that the open()'s result is closed after open() but before urandom.c reacquires the GIL. Then urandom.c gets a closed file descriptor. If additionally the other thread (or a 3rd one) opens a different file at file descriptor 4, then urandom.c will think it successfully opened /dev/urandom but actually the file descriptor is for some unrelated file. However, this is probably an issue that cannot be dealt with at all on Posix even in C. That would mean that it is always wrong to close() unknown file descriptors from one thread when other threads might be running... This would not hurt being clarified. A bientôt, Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
participants (5)
-
Armin Rigo
-
Carl Friedrich Bolz
-
hubo
-
Maciej Fijalkowski
-
Vincent Legoll