[Numpy-discussion] Segfault using "fromstring" and reading variable length string

Ralf Gommers ralf.gommers at googlemail.com
Fri Apr 22 14:37:46 EDT 2011

On Thu, Apr 21, 2011 at 10:06 PM, Gökhan Sever <gokhansever at gmail.com> wrote:
> Hello,
> Given this piece of code (I can provide the meg file off-the list for those
> who wants to reproduce the error)

Can you instead construct a test as simple as possible for this? It
sounds like you need only a two line string to reproduce this. The bug
sounds similar to http://projects.scipy.org/numpy/ticket/1689.


> import numpy as np
> f = open("a08A0122.341071.meg", "rb")
> dt = np.dtype([('line1', '|S80'), ('line2', np.object_), ('line3', '|S80'),
> ('line4', '|S80'),
>                ('line5', '|S80'), ('line6', '|S2'), ('line7', np.int32,
> 2000), ('line8', '|S2'),
>                ('line9', np.int32, 2000), ('line10', '|S2')])
> k = np.fromstring(f.read(dt.itemsize), dt)[0]
> Accessing k causes a "Segmentation fault (core dumped)" and kills my python
> and IPython sessions immediately.  I actually know that the culprit is
> "np.object_" in this case.  The original was as ('line2', '|S81') however
> those meg files (mix of text and binary content) have a funny habit of
> switching from 80 characters to 81 (including "/r/n" chars). I was testing
> if I could create a variable length string dtype, which seems not possible.
> Little more info: that line2 has time stamps, one of which is in the form
> of 22:34:59.999. I have seen in the file that 22:34:59.999 was originally
> written as 22:34:59.1000 which causes that extra character flow.
> (Interestingly, millisecond should cycle from 0-999 and overflow at 999
> instead of 1000 which to me indicates a slight bug) Because of this reason,
> I can't read the whole content of those meg files since somewhere in the
> middle fromstring attempts reading a shifted (erroneous) content. Should I
> go fix that millisecond overflow first or is there an alternative way to
> approach this problem?

