[issue8865] select.poll is not thread safe
New submission from Christian Schubert <bugs@apexo.de>: invoking select.poll.poll() concurrently from multiple threads frequently yields garbage in one thread: while poll_poll in thread 1 is parsing its result, another thread 2 calling poll may overwrite revents; assuming poll_result was 1 in thread 1 and thread 2 managed to clear all revents before thread 1 started scanning ufds, thread 1 would iterate straight through all ufds, past its bounds (no bound checks there), and return the first out-of-bounds entry that happens to have revents != 0 this issue needs at least documentation (although bounds-checking to prevent garbage in the result wouldn't hurt) also, since there doesn't seem to be any locking w/ regards to ufds, it might be possible to corrupt python's heap, by concurrently invoking poll_register and poll_poll. poll_register could move the ufds array to another location while resizing it and poll_poll would subsequently overwrite memory that is not allocated anymore or allocated by someone else (did not test that) python 2.5.5 ---------- assignee: docs@python components: Documentation, Library (Lib) messages: 106815 nosy: apexo, docs@python priority: normal severity: normal status: open title: select.poll is not thread safe type: behavior versions: Python 2.5 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8865> _______________________________________
Antoine Pitrou <pitrou@free.fr> added the comment: Do you have a script that reproduces it? ---------- assignee: docs@python -> components: -Documentation nosy: +pitrou versions: +Python 2.6, Python 2.7, Python 3.1, Python 3.2 -Python 2.5 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8865> _______________________________________
Christian Schubert <bugs@apexo.de> added the comment: okay, I've managed to put together a script that fairly consistently reproduces the problem on my (core2 duo 2,something GHz) machine some parameters (sleep times) are randomized in each iteration, the script runs for 30 iterations, each for 1 second, and counts the number of events where the fd looks bogus (max 1 per iteration); some other events (fd correct, revents bogus are also reported, but not counted) example output: poll returned [(-3, 65535)], we were looking for fd 3 poll returned [(-3, 65535)], we were looking for fd 3 poll returned [(-3, 65535)], we were looking for fd 3 poll returned [(-3, 65535)], we were looking for fd 3 poll returned [(-3, 65535)], we were looking for fd 3 poll returned [(-1, 65535)], we were looking for fd 3 poll returned [(-1, 65535)], we were looking for fd 3 poll returned [(-1627358347, 60497)], we were looking for fd 3 should not get here; poll returned [(3, 0)] poll returned [(-3, 65535)], we were looking for fd 3 9 events in 30 iterations ---------- Added file: http://bugs.python.org/file17510/select_threads.py _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8865> _______________________________________
Christian Schubert <bugs@apexo.de> added the comment: the other issue (poll_register freeing data structures that poll_poll currently uses) can be reproduced by the attached script (at least I can) depending on the parameters/circumstances it either segfaults or prints garbage results from poll, e.g. [(1684955474, 28192)]; the expected result would be [(3, 1)] ---------- Added file: http://bugs.python.org/file17511/concurrent_poll_and_register.py _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8865> _______________________________________
Christian Schubert <bugs@apexo.de> added the comment: added a patch which fixes both issues before releasing the GIL we take a copy of the ufds pointer and its len, erasing the ufds pointer in the poll object (to make sure nobody else fiddles with it); when we're done we either but it back into the object (when it's still NULL) or we free it the update logic is modified as well, since the uptodate flag is not sufficient anymore, we now count up a tag for each modification of the poll dictionary and remember that tag together with the ufds pointer the result is: - for the single-threaded case we do little extra work (moving ufds from/to poll object) - correct for the multi-threaded case, with slightly higher overhead (one additional call to each of malloc, update_ufd_array, free), probably worse than having one poll object per thread, but not worse than allocating a new poll object each time we want to poll there still is potential for incorrect poll results (but not for memory corruption): when poll_register/poll_unregister are called exactly 2**32 times (or multiples thereof) while poll_poll is running, it will happily put back its outdated ufds pointer into the poll object when its done, this could be alleviated by changing tag to long long ... which is unlikely to wrap around anytime soon. ---------- keywords: +patch Added file: http://bugs.python.org/file17675/select.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8865> _______________________________________
Changes by Antoine Pitrou <pitrou@free.fr>: ---------- nosy: +exarkun _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue8865> _______________________________________
participants (2)
-
Antoine Pitrou
-
Christian Schubert