[pypy-issue] Issue #2530: segfault with ThreadPool, pandas (pypy/pypy)

Wed Apr 5 15:29:50 EDT 2017

New issue 2530: segfault with ThreadPool, pandas
https://bitbucket.org/pypy/pypy/issues/2530/segfault-with-threadpool-pandas

mattip:

I can install pandas using pip install (it takes a while to build)::

```
#!shell

pip install pandas
```
using the 5.7.1 release on Ubuntu 16.04, I get a segfault when running this code (distilled from a pandas test). If I remove the ``from __future__ ...`` the segfault disappears (???)

```
#!python
from __future__ import division
from multiprocessing.pool import ThreadPool
from pandas import read_csv, read_table
from pandas.compat import BytesIO, range
import pandas.util.testing as tm

class TestMultithreadTests(object):
    engine = 'python'

    def read_csv(self, *args):
        ret = read_csv(*args, engine='python')
        return ret

    def test_multithread_stringio_read_csv(self):
        # see gh-11786
        max_row_range = 10000
        num_files = 100

        bytes_to_df = [
            '\n'.join(
                ['%d,%d,%d' % (i, i, i) for i in range(max_row_range)]
            ).encode() for j in range(num_files)]
        files = [BytesIO(b) for b in bytes_to_df]

        # read all files in many threads
        pool = ThreadPool(8)
        results = pool.map(self.read_csv, files)
        first_result = results[0]

        for result in results:
            tm.assert_frame_equal(first_result, result)

if __name__ == '__main__':
    t = TestMultithreadTests()
    t.test_multithread_stringio_read_csv()

```