Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1

well, ok, but that does not sound like a pypy bug then - "close all existing fds and complain that some of them are closed" is a bit not good - maybe it's a bug in python-daemon and the PEP? On Wed, Dec 23, 2015 at 3:36 PM, hubo <hubo@jiedaibao.com> wrote:

Hi Maciek, Of course it's a difference between pypy and cpython. if you close all fds, cpython's random module still works, but PyPy's doesn't. Cheers, Carl Friedrich On December 23, 2015 2:54:23 PM GMT+01:00, Maciej Fijalkowski <fijall@gmail.com> wrote:

I bet the difference is due to "we see more FDs in pypy". If you close the correct FD in CPython, it would break too I presume? Or is there special code to handle that? On Wed, Dec 23, 2015 at 3:57 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:

Hi Maciej, On Wed, Dec 23, 2015 at 3:48 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
There is special code to handle that in CPython. It is actually very carefully checking that the FD is still open *and* still pointing to what it used to. See Python/random.c. Just thinking aloud: what could occur here is that the forked process reopens an unrelated file at this descriptor, and then our os.urandom() tries to read from it---and, as a guess, the file happens to be a socket opened in non-blocking mode, and our implementation of os.urandom() gets 0 bytes and keeps trying to get more, which throws it into an infinite loop. It seems it would be a good idea to copy the careful behavior of CPython. (There used to be another case of file descriptor kept open internally, in rpython/rtyper/lltypesystem/llarena.py for /dev/zero, but this is gone.) A bientôt, Armin.

Do you thought about that removed code ? https://bitbucket.org/vincentlegoll/pypy/commits/992e29624c5fd8a7e76677acf08... The filedesc was not kept opened there, or do you meant another thing in this file ? On Wed, Dec 23, 2015 at 6:24 PM, Armin Rigo <arigo@tunes.org> wrote:
-- Vincent Legoll

Hi Vincent, On Wed, Dec 23, 2015 at 7:05 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
Thanks for looking it up :-) Yes, that's exactly what I have in mind, and indeed I see now that the file descriptor was not kept opened, so it didn't have the problem anyway. However rpython.rlib.rurandom line 101 keeps a file descriptor opened and reuses it without checking, which is the problem here. A bientôt, Armin.

Hi Vincent, On Thu, Dec 24, 2015 at 3:34 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
I may give this an eye if no one beats me to it, just not today, time to overeat!
:-) In the meantime I reverted the older checkin that caches the file descriptor, with a comment, in c6be1b27fa1d. A bientôt, Armin.

:-) In the meantime I reverted the older checkin that caches the file descriptor, with a comment, in c6be1b27fa1d.
Are we really seeing a closed and reopened fd ? And if that's the case it may even be intentional ;-) because if we're only seeing a closed python file object, we could go the simpler way: - if not context[0]: + if (not context[0] or + type(context[0]) == file and context[0].closed): context[0] = os.open("/dev/urandom", os.O_RDONLY, 0777) I'll try to reproduce first then test if this simple fix is enough... Frohe Weihnachten ! -- Vincent Legoll

Hi Vincent, On Thu, Dec 24, 2015 at 5:50 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
No, please don't commit "half fixes" like that. We know that there is a situation in which bad things occurs, so we need to deal with it. Fixing half the problem might make this particular case work, but not all of them. At least the current situation is not very good performance-wise but should work. A bientôt, Armin.

OK, I understand we want to have behavior as identical to cpython as possible. So does that mean implementing the same : "fstat() & check we're still on the same inode & device as before" ? See cpython's Python/random.c:208 This I have done, but I'll need help to make the test work (as in detect something is working after the patch that was not before)... Do we also need to do the "could have been opened in another thread" dance ? And what about the CLO_EXEC thing, this does seem sensible to do too, albeit not directly tied to the bug, even if it would make it happen in more cases (I think) On Thu, Dec 24, 2015 at 8:28 PM, Armin Rigo <arigo@tunes.org> wrote:
-- Vincent Legoll

Hi Vincent, On Sat, Dec 26, 2015 at 1:45 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
This is what I started to do, but I stopped because there are additional issues on PyPy: you need to make sure that the GIL is not released at places where CPython does not release it either. Otherwise, you're opening the code to "race conditions" again. The goal would be to be *at least* as safe as CPython. There are a lot of corner cases that are not discussed in the source code of CPython. I'm pretty sure that some of them are possible (but rare). As an extreme example, if one thread does os.urandom() and another thread does os.close(4) in parallel, where 4 happens to be the file descriptor returned by open() in urandom.c, then it is possible that the open()'s result is closed after open() but before urandom.c reacquires the GIL. Then urandom.c gets a closed file descriptor. If additionally the other thread (or a 3rd one) opens a different file at file descriptor 4, then urandom.c will think it successfully opened /dev/urandom but actually the file descriptor is for some unrelated file. However, this is probably an issue that cannot be dealt with at all on Posix even in C. That would mean that it is always wrong to close() unknown file descriptors from one thread when other threads might be running... This would not hurt being clarified. A bientôt, Armin.

So to stay on the safe side, you prefer to keep reopening every time ? Fine, I've put my work here: https://bitbucket.org/vincentlegoll/pypy/commits/branch/fix-urandom-closed it contains a test for the "close /dev/urandom fd from under our feet" and a typo fix that you may still want to cherry pick... Tell me if there's something else I can do On Sun, Dec 27, 2015 at 12:42 AM, Armin Rigo <arigo@tunes.org> wrote:
-- Vincent Legoll

Hi Vincent, On Sun, Dec 27, 2015 at 11:45 AM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
So to stay on the safe side, you prefer to keep reopening every time ? https://bitbucket.org/vincentlegoll/pypy/commits/branch/fix-urandom-closed
I'm not against a proper fix :-) The problem is that for the fix to be proper, it needs a bit more work than what you did. The problem is that the calls to os.fstat() in your code can each release the GIL; in CPython, the corresponding calls to fstat() are done without the GIL released, and I believe it is important. A bientôt, Armin.

Hello and happy new pypyear to all pypyers ! On Tue, Jan 5, 2016 at 10:13 AM, Armin Rigo <arigo@tunes.org> wrote:
OK for a proper fix, how can we write a test that will check for that ? Is there a doc for pypy's way to grab the GIL ? BTW, in the always open()ing, isn't the race still there, because the GIL is released between open() and read(), no ? So the fd can be closed here too... That could certainly be a smaller gap, but still... -- Vincent Legoll

I think the multi-threaded cases are not really problems. The race condition only happens when another thread closed the file descriptor, and the only knowing case of closing file descriptors without knowing what they do is during fork (daemon) process. It always produces unpredictable results if you fork with multiple threads without planning anyway. For GIL, I think it is acceptable for PyPy to be different with CPython in very rare cases. Users should understand they have different implement details. It is only important that common use cases like python-daemon works as expected. 2015-12-28 hubo 发件人:Armin Rigo <arigo@tunes.org> 发送时间:2015-12-27 07:42 主题:Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1 收件人:"Vincent Legoll"<vincent.legoll@gmail.com> 抄送:"pypy-dev"<pypy-dev@python.org> Hi Vincent, On Sat, Dec 26, 2015 at 1:45 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
This is what I started to do, but I stopped because there are additional issues on PyPy: you need to make sure that the GIL is not released at places where CPython does not release it either. Otherwise, you're opening the code to "race conditions" again. The goal would be to be *at least* as safe as CPython. There are a lot of corner cases that are not discussed in the source code of CPython. I'm pretty sure that some of them are possible (but rare). As an extreme example, if one thread does os.urandom() and another thread does os.close(4) in parallel, where 4 happens to be the file descriptor returned by open() in urandom.c, then it is possible that the open()'s result is closed after open() but before urandom.c reacquires the GIL. Then urandom.c gets a closed file descriptor. If additionally the other thread (or a 3rd one) opens a different file at file descriptor 4, then urandom.c will think it successfully opened /dev/urandom but actually the file descriptor is for some unrelated file. However, this is probably an issue that cannot be dealt with at all on Posix even in C. That would mean that it is always wrong to close() unknown file descriptors from one thread when other threads might be running... This would not hurt being clarified. A bientôt, Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev

Hi Maciek, Of course it's a difference between pypy and cpython. if you close all fds, cpython's random module still works, but PyPy's doesn't. Cheers, Carl Friedrich On December 23, 2015 2:54:23 PM GMT+01:00, Maciej Fijalkowski <fijall@gmail.com> wrote:

I bet the difference is due to "we see more FDs in pypy". If you close the correct FD in CPython, it would break too I presume? Or is there special code to handle that? On Wed, Dec 23, 2015 at 3:57 PM, Carl Friedrich Bolz <cfbolz@gmx.de> wrote:

Hi Maciej, On Wed, Dec 23, 2015 at 3:48 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
There is special code to handle that in CPython. It is actually very carefully checking that the FD is still open *and* still pointing to what it used to. See Python/random.c. Just thinking aloud: what could occur here is that the forked process reopens an unrelated file at this descriptor, and then our os.urandom() tries to read from it---and, as a guess, the file happens to be a socket opened in non-blocking mode, and our implementation of os.urandom() gets 0 bytes and keeps trying to get more, which throws it into an infinite loop. It seems it would be a good idea to copy the careful behavior of CPython. (There used to be another case of file descriptor kept open internally, in rpython/rtyper/lltypesystem/llarena.py for /dev/zero, but this is gone.) A bientôt, Armin.

Do you thought about that removed code ? https://bitbucket.org/vincentlegoll/pypy/commits/992e29624c5fd8a7e76677acf08... The filedesc was not kept opened there, or do you meant another thing in this file ? On Wed, Dec 23, 2015 at 6:24 PM, Armin Rigo <arigo@tunes.org> wrote:
-- Vincent Legoll

Hi Vincent, On Wed, Dec 23, 2015 at 7:05 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
Thanks for looking it up :-) Yes, that's exactly what I have in mind, and indeed I see now that the file descriptor was not kept opened, so it didn't have the problem anyway. However rpython.rlib.rurandom line 101 keeps a file descriptor opened and reuses it without checking, which is the problem here. A bientôt, Armin.

Hi Vincent, On Thu, Dec 24, 2015 at 3:34 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
I may give this an eye if no one beats me to it, just not today, time to overeat!
:-) In the meantime I reverted the older checkin that caches the file descriptor, with a comment, in c6be1b27fa1d. A bientôt, Armin.

:-) In the meantime I reverted the older checkin that caches the file descriptor, with a comment, in c6be1b27fa1d.
Are we really seeing a closed and reopened fd ? And if that's the case it may even be intentional ;-) because if we're only seeing a closed python file object, we could go the simpler way: - if not context[0]: + if (not context[0] or + type(context[0]) == file and context[0].closed): context[0] = os.open("/dev/urandom", os.O_RDONLY, 0777) I'll try to reproduce first then test if this simple fix is enough... Frohe Weihnachten ! -- Vincent Legoll

Hi Vincent, On Thu, Dec 24, 2015 at 5:50 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
No, please don't commit "half fixes" like that. We know that there is a situation in which bad things occurs, so we need to deal with it. Fixing half the problem might make this particular case work, but not all of them. At least the current situation is not very good performance-wise but should work. A bientôt, Armin.

OK, I understand we want to have behavior as identical to cpython as possible. So does that mean implementing the same : "fstat() & check we're still on the same inode & device as before" ? See cpython's Python/random.c:208 This I have done, but I'll need help to make the test work (as in detect something is working after the patch that was not before)... Do we also need to do the "could have been opened in another thread" dance ? And what about the CLO_EXEC thing, this does seem sensible to do too, albeit not directly tied to the bug, even if it would make it happen in more cases (I think) On Thu, Dec 24, 2015 at 8:28 PM, Armin Rigo <arigo@tunes.org> wrote:
-- Vincent Legoll

Hi Vincent, On Sat, Dec 26, 2015 at 1:45 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
This is what I started to do, but I stopped because there are additional issues on PyPy: you need to make sure that the GIL is not released at places where CPython does not release it either. Otherwise, you're opening the code to "race conditions" again. The goal would be to be *at least* as safe as CPython. There are a lot of corner cases that are not discussed in the source code of CPython. I'm pretty sure that some of them are possible (but rare). As an extreme example, if one thread does os.urandom() and another thread does os.close(4) in parallel, where 4 happens to be the file descriptor returned by open() in urandom.c, then it is possible that the open()'s result is closed after open() but before urandom.c reacquires the GIL. Then urandom.c gets a closed file descriptor. If additionally the other thread (or a 3rd one) opens a different file at file descriptor 4, then urandom.c will think it successfully opened /dev/urandom but actually the file descriptor is for some unrelated file. However, this is probably an issue that cannot be dealt with at all on Posix even in C. That would mean that it is always wrong to close() unknown file descriptors from one thread when other threads might be running... This would not hurt being clarified. A bientôt, Armin.

So to stay on the safe side, you prefer to keep reopening every time ? Fine, I've put my work here: https://bitbucket.org/vincentlegoll/pypy/commits/branch/fix-urandom-closed it contains a test for the "close /dev/urandom fd from under our feet" and a typo fix that you may still want to cherry pick... Tell me if there's something else I can do On Sun, Dec 27, 2015 at 12:42 AM, Armin Rigo <arigo@tunes.org> wrote:
-- Vincent Legoll

Hi Vincent, On Sun, Dec 27, 2015 at 11:45 AM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
So to stay on the safe side, you prefer to keep reopening every time ? https://bitbucket.org/vincentlegoll/pypy/commits/branch/fix-urandom-closed
I'm not against a proper fix :-) The problem is that for the fix to be proper, it needs a bit more work than what you did. The problem is that the calls to os.fstat() in your code can each release the GIL; in CPython, the corresponding calls to fstat() are done without the GIL released, and I believe it is important. A bientôt, Armin.

Hello and happy new pypyear to all pypyers ! On Tue, Jan 5, 2016 at 10:13 AM, Armin Rigo <arigo@tunes.org> wrote:
OK for a proper fix, how can we write a test that will check for that ? Is there a doc for pypy's way to grab the GIL ? BTW, in the always open()ing, isn't the race still there, because the GIL is released between open() and read(), no ? So the fd can be closed here too... That could certainly be a smaller gap, but still... -- Vincent Legoll

I think the multi-threaded cases are not really problems. The race condition only happens when another thread closed the file descriptor, and the only knowing case of closing file descriptors without knowing what they do is during fork (daemon) process. It always produces unpredictable results if you fork with multiple threads without planning anyway. For GIL, I think it is acceptable for PyPy to be different with CPython in very rare cases. Users should understand they have different implement details. It is only important that common use cases like python-daemon works as expected. 2015-12-28 hubo 发件人:Armin Rigo <arigo@tunes.org> 发送时间:2015-12-27 07:42 主题:Re: [pypy-dev] Dead loop occurs when using python-daemon and multiprocessing together in PyPy 4.0.1 收件人:"Vincent Legoll"<vincent.legoll@gmail.com> 抄送:"pypy-dev"<pypy-dev@python.org> Hi Vincent, On Sat, Dec 26, 2015 at 1:45 PM, Vincent Legoll <vincent.legoll@gmail.com> wrote:
This is what I started to do, but I stopped because there are additional issues on PyPy: you need to make sure that the GIL is not released at places where CPython does not release it either. Otherwise, you're opening the code to "race conditions" again. The goal would be to be *at least* as safe as CPython. There are a lot of corner cases that are not discussed in the source code of CPython. I'm pretty sure that some of them are possible (but rare). As an extreme example, if one thread does os.urandom() and another thread does os.close(4) in parallel, where 4 happens to be the file descriptor returned by open() in urandom.c, then it is possible that the open()'s result is closed after open() but before urandom.c reacquires the GIL. Then urandom.c gets a closed file descriptor. If additionally the other thread (or a 3rd one) opens a different file at file descriptor 4, then urandom.c will think it successfully opened /dev/urandom but actually the file descriptor is for some unrelated file. However, this is probably an issue that cannot be dealt with at all on Posix even in C. That would mean that it is always wrong to close() unknown file descriptors from one thread when other threads might be running... This would not hurt being clarified. A bientôt, Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
participants (5)
-
Armin Rigo
-
Carl Friedrich Bolz
-
hubo
-
Maciej Fijalkowski
-
Vincent Legoll