[pypy-issue] Issue #2530: segfault with ThreadPool, pandas (pypy/pypy)
mattip
issues-reply at bitbucket.org
Wed Apr 5 15:29:50 EDT 2017
New issue 2530: segfault with ThreadPool, pandas
https://bitbucket.org/pypy/pypy/issues/2530/segfault-with-threadpool-pandas
mattip:
I can install pandas using pip install (it takes a while to build)::
```
#!shell
pip install pandas
```
using the 5.7.1 release on Ubuntu 16.04, I get a segfault when running this code (distilled from a pandas test). If I remove the ``from __future__ ...`` the segfault disappears (???)
```
#!python
from __future__ import division
from multiprocessing.pool import ThreadPool
from pandas import read_csv, read_table
from pandas.compat import BytesIO, range
import pandas.util.testing as tm
class TestMultithreadTests(object):
engine = 'python'
def read_csv(self, *args):
ret = read_csv(*args, engine='python')
return ret
def test_multithread_stringio_read_csv(self):
# see gh-11786
max_row_range = 10000
num_files = 100
bytes_to_df = [
'\n'.join(
['%d,%d,%d' % (i, i, i) for i in range(max_row_range)]
).encode() for j in range(num_files)]
files = [BytesIO(b) for b in bytes_to_df]
# read all files in many threads
pool = ThreadPool(8)
results = pool.map(self.read_csv, files)
first_result = results[0]
for result in results:
tm.assert_frame_equal(first_result, result)
if __name__ == '__main__':
t = TestMultithreadTests()
t.test_multithread_stringio_read_csv()
```
More information about the pypy-issue
mailing list