Segfault using "fromstring" and reading variable length string
Hello, Given this piece of code (I can provide the meg file off-the list for those who wants to reproduce the error) import numpy as np f = open("a08A0122.341071.meg", "rb") dt = np.dtype([('line1', '|S80'), ('line2', np.object_), ('line3', '|S80'), ('line4', '|S80'), ('line5', '|S80'), ('line6', '|S2'), ('line7', np.int32, 2000), ('line8', '|S2'), ('line9', np.int32, 2000), ('line10', '|S2')]) k = np.fromstring(f.read(dt.itemsize), dt)[0] Accessing k causes a "Segmentation fault (core dumped)" and kills my python and IPython sessions immediately. I actually know that the culprit is "np.object_" in this case. The original was as ('line2', '|S81') however those meg files (mix of text and binary content) have a funny habit of switching from 80 characters to 81 (including "/r/n" chars). I was testing if I could create a variable length string dtype, which seems not possible. Little more info: that line2 has time stamps, one of which is in the form of 22:34:59.999. I have seen in the file that 22:34:59.999 was originally written as 22:34:59.1000 which causes that extra character flow. (Interestingly, millisecond should cycle from 0-999 and overflow at 999 instead of 1000 which to me indicates a slight bug) Because of this reason, I can't read the whole content of those meg files since somewhere in the middle fromstring attempts reading a shifted (erroneous) content. Should I go fix that millisecond overflow first or is there an alternative way to approach this problem? Thanks ================================================================================ Platform : Linux-2.6.35.12-88.fc14.x86_64-x86_64-with-fedora-14-Laughlin Python : ('CPython', 'tags/r27', '82500') IPython : 0.10 NumPy : 2.0.0.dev-2e96d91 ================================================================================ -- Gökhan
On Thu, Apr 21, 2011 at 10:06 PM, Gökhan Sever
Hello, Given this piece of code (I can provide the meg file off-the list for those who wants to reproduce the error)
Can you instead construct a test as simple as possible for this? It sounds like you need only a two line string to reproduce this. The bug sounds similar to http://projects.scipy.org/numpy/ticket/1689. Ralf
import numpy as np f = open("a08A0122.341071.meg", "rb") dt = np.dtype([('line1', '|S80'), ('line2', np.object_), ('line3', '|S80'), ('line4', '|S80'), ('line5', '|S80'), ('line6', '|S2'), ('line7', np.int32, 2000), ('line8', '|S2'), ('line9', np.int32, 2000), ('line10', '|S2')]) k = np.fromstring(f.read(dt.itemsize), dt)[0] Accessing k causes a "Segmentation fault (core dumped)" and kills my python and IPython sessions immediately. I actually know that the culprit is "np.object_" in this case. The original was as ('line2', '|S81') however those meg files (mix of text and binary content) have a funny habit of switching from 80 characters to 81 (including "/r/n" chars). I was testing if I could create a variable length string dtype, which seems not possible. Little more info: that line2 has time stamps, one of which is in the form of 22:34:59.999. I have seen in the file that 22:34:59.999 was originally written as 22:34:59.1000 which causes that extra character flow. (Interestingly, millisecond should cycle from 0-999 and overflow at 999 instead of 1000 which to me indicates a slight bug) Because of this reason, I can't read the whole content of those meg files since somewhere in the middle fromstring attempts reading a shifted (erroneous) content. Should I go fix that millisecond overflow first or is there an alternative way to approach this problem?
On Fri, Apr 22, 2011 at 12:37 PM, Ralf Gommers
Hello, Given this piece of code (I can provide the meg file off-the list for
On Thu, Apr 21, 2011 at 10:06 PM, Gökhan Sever
wrote: those who wants to reproduce the error)
Can you instead construct a test as simple as possible for this? It sounds like you need only a two line string to reproduce this. The bug sounds similar to http://projects.scipy.org/numpy/ticket/1689.
Ralf
This simple case segfaults as well (The commented line works correctly): import numpy as np from StringIO import StringIO c = StringIO(" hello \r\n world \r\n") dt = np.dtype([('line1', '|S6'), ('line2', np.object_)]) #dt = np.dtype([('line1', '|S9'), ('line2', '|S9')]) k = np.fromstring(c.read(dt.itemsize), dt)[0] -- Gökhan
On 4/22/2011 2:52 PM, Gökhan Sever wrote:
On Fri, Apr 22, 2011 at 12:37 PM, Ralf Gommers
mailto:ralf.gommers@googlemail.com> wrote: On Thu, Apr 21, 2011 at 10:06 PM, Gökhan Sever
mailto:gokhansever@gmail.com> wrote: > Hello, > Given this piece of code (I can provide the meg file off-the list for those > who wants to reproduce the error) Can you instead construct a test as simple as possible for this? It sounds like you need only a two line string to reproduce this. The bug sounds similar to http://projects.scipy.org/numpy/ticket/1689.
Ralf
This simple case segfaults as well (The commented line works correctly):
import numpy as np from StringIO import StringIO
c = StringIO(" hello \r\n world \r\n")
dt = np.dtype([('line1', '|S6'), ('line2', np.object_)]) #dt = np.dtype([('line1', '|S9'), ('line2', '|S9')]) k = np.fromstring(c.read(dt.itemsize), dt)[0]
I can reproduce the crash with a recent build of numpy 1.6. It is in arraytypes.c.src, line 521: static PyObject * OBJECT_getitem(char *ip, PyArrayObject *ap) { PyObject *obj; NPY_COPY_PYOBJECT_PTR(&obj, ip); if (obj == NULL) { Py_INCREF(Py_None); return Py_None; } else { Py_INCREF(obj); /* <== crash */ return obj; } } There's no check whether obj is a valid PyObject (it is not in this case). Christoph
On Fri, Apr 22, 2011 at 5:08 PM, Christoph Gohlke
On 4/22/2011 2:52 PM, Gökhan Sever wrote:
On Fri, Apr 22, 2011 at 12:37 PM, Ralf Gommers
mailto:ralf.gommers@googlemail.com> wrote:
On Thu, Apr 21, 2011 at 10:06 PM, Gökhan Sever
mailto:gokhansever@gmail.com> wrote: > Hello, > Given this piece of code (I can provide the meg file off-the list for those > who wants to reproduce the error) Can you instead construct a test as simple as possible for this? It sounds like you need only a two line string to reproduce this. The
bug
sounds similar to http://projects.scipy.org/numpy/ticket/1689.
Ralf
This simple case segfaults as well (The commented line works correctly):
import numpy as np from StringIO import StringIO
c = StringIO(" hello \r\n world \r\n")
dt = np.dtype([('line1', '|S6'), ('line2', np.object_)]) #dt = np.dtype([('line1', '|S9'), ('line2', '|S9')]) k = np.fromstring(c.read(dt.itemsize), dt)[0]
I can reproduce the crash with a recent build of numpy 1.6. It is in arraytypes.c.src, line 521:
static PyObject * OBJECT_getitem(char *ip, PyArrayObject *ap) { PyObject *obj; NPY_COPY_PYOBJECT_PTR(&obj, ip); if (obj == NULL) { Py_INCREF(Py_None); return Py_None; } else { Py_INCREF(obj); /* <== crash */ return obj; } }
There's no check whether obj is a valid PyObject (it is not in this case).
I took a quick look at this issue and committed a fix. PyArray_FromString was doing a check to exclude object arrays, but that check was incorrect. Now it should appropriately raise an exception instead of creating an invalid array. https://github.com/numpy/numpy/commit/f75bfab3a2ab74ac82047f153a36c71c58fe37... -Mark
On Fri, Apr 22, 2011 at 6:32 PM, Mark Wiebe
I took a quick look at this issue and committed a fix. PyArray_FromString was doing a check to exclude object arrays, but that check was incorrect. Now it should appropriately raise an exception instead of creating an invalid array.
https://github.com/numpy/numpy/commit/f75bfab3a2ab74ac82047f153a36c71c58fe37...
-Mark
Thanks for the fix. Now that line yields: I[6]: k = np.fromstring(c.read(dt.itemsize), dt)[0] --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /home/gsever/Desktop/python-repo/numpy/<ipython console> in <module>() ValueError: Cannot create an object array from a string -- Gökhan
participants (4)
-
Christoph Gohlke
-
Gökhan Sever
-
Mark Wiebe
-
Ralf Gommers